Skip to content

Part 3.1: Working With Simple Tables

Quick Summary

In this tutorial, you will:

  • model non-time-series data with SimpleTable
  • build a small three-table relation: customers, balances, and debts
  • attach those tables to SimpleTableUpdater classes
  • insert, read, sparse-upsert, and delete rows
  • run typed filters, including join filters
  • inspect the resulting backend tables from the CLI

The concrete example for this tutorial lives in examples/data_nodes/simple_tables.py.

Before going further, it is important to frame the feature correctly.

As the name suggests, SimpleTable is designed to create small ORM-like tables inside a project, with a heavily reduced database configuration compared with a full application ORM. The spirit of MainSequence is fast iteration inside a normalized working environment: define a schema, attach it to an updater, persist rows, and query them back without having to design a full database layer.

That makes SimpleTable a good fit for:

  • reference data
  • small relational record sets inside a project
  • application-facing tables that support lightweight insert, read, update, and delete workflows

It is not meant to replace a full ORM or a full database application model. If you need more complex transactional behavior, richer relational modeling, custom migration workflows, or a broader application database layer, you should not rely on SimpleTable for that.

1. When To Use SimpleTable

Use SimpleTable when your data is row-oriented but not naturally normalized as a time series.

Good examples:

  • customers
  • counterparties
  • portfolios
  • mapping tables
  • balance snapshots where time is just another column

If your data is fundamentally organized around time_index and unique_identifier, you usually want a DataNode table instead.

SimpleTable still fits into the same graph-oriented MainSequence workflow:

  • it has a schema
  • it has an updater
  • it has update_hash and storage_hash
  • it can depend on other updaters

2. Important: id Is Backend-Managed

Important

Every simple-table row has an id, but you must not declare that field yourself.

Why:

  • the backend assigns it
  • it must not participate in schema hashing
  • allowing users to define it would create collisions

Correct:

class CustomerRecord(SimpleTable):
    customer_code: Annotated[str, Index(unique=True), Ops(filter=True, order=True)] = Field(...)

Wrong:

class CustomerRecord(SimpleTable):
    id: int
    customer_code: Annotated[str, Index(unique=True), Ops(filter=True, order=True)] = Field(...)

The normal lifecycle is:

  1. insert rows without id
  2. read them back
  3. use the returned id for sparse upserts or deletes

This is especially important for foreign-key workflows. Downstream rows usually need parent ids that were assigned by the backend during earlier inserts.

Warning

Index(unique=True) helps with lookup and constraints, but it is not the key used for overwrite/upsert writes.

If update() returns (records, True), overwrite is keyed by the backend-managed row id.

That means:

  • Index(unique=True) does not make a field the write key for overwrite
  • returning (records, True) requires id-populated rows
  • if your business key is something like customer_code, first read existing rows, map customer_code -> id, insert any missing rows, then return rows that include those ids

3. Define the Table Schemas

The tutorial example uses three tables:

  • CustomerRecord
  • CustomerBalanceRecord
  • CustomerDebtRecord
import datetime
from typing import Annotated

from pydantic import Field

from mainsequence.tdag.simple_tables import ForeignKey, Index, Ops, SimpleTable


class CustomerRecord(SimpleTable):
    customer_code: Annotated[str, Index(unique=True), Ops(filter=True, order=True)] = Field(
        ...,
        title="Customer Code",
        description="Stable customer identifier.",
    )
    name: Annotated[str, Ops(filter=True, order=True)] = Field(
        ...,
        title="Name",
        description="Human-readable customer name.",
    )
    region: Annotated[str, Index(), Ops(filter=True)] = Field(
        ...,
        title="Region",
        description="Commercial region for the customer.",
    )


class CustomerBalanceRecord(SimpleTable):
    customer_id: Annotated[
        int,
        ForeignKey("customers", on_delete="cascade"),
        Index(),
        Ops(filter=True),
    ] = Field(
        ...,
        title="Customer Id",
        description="Foreign key to the customer table.",
    )
    as_of_date: Annotated[datetime.date, Ops(filter=True, order=True)] = Field(
        ...,
        title="As Of Date",
        description="Balance snapshot date.",
    )
    balance_usd: Annotated[float, Ops(filter=True, order=True)] = Field(
        ...,
        title="Balance USD",
        description="Customer balance in USD.",
    )


class CustomerDebtRecord(SimpleTable):
    balance_id: Annotated[
        int,
        ForeignKey("balances", on_delete="cascade"),
        Index(),
        Ops(filter=True),
    ] = Field(
        ...,
        title="Balance Id",
        description="Foreign key to the balance snapshot row.",
    )
    debt_type: Annotated[str, Ops(filter=True, order=True)] = Field(
        ...,
        title="Debt Type",
        description="Debt category associated with the balance row.",
    )
    debt_usd: Annotated[float, Ops(filter=True, order=True)] = Field(
        ...,
        title="Debt USD",
        description="Debt amount in USD.",
    )

What each schema feature is doing:

  • Annotated[...] keeps the field type and metadata together
  • Index(...) declares useful read paths
  • Ops(...) declares which fields are filterable or orderable
  • ForeignKey(...) declares the relation to a dependency updater key

For a deeper explanation of the schema DSL, see docs/knowledge/simple_tables/simple_table.md.

4. Attach Each Table To an Updater

SimpleTable defines the schema. SimpleTableUpdater owns the real backend table and the actual read/write workflow.

Insert-only vs overwrite/upsert in SimpleTableUpdater.update()

SimpleTableUpdater.update() can return either:

  • records
  • (records, overwrite)

If you only return records, the updater behaves like overwrite=False.

Use overwrite=False when you are inserting new rows that do not have backend ids yet. That is the safe default for seed-style examples.

Use overwrite=True only when the returned rows already include backend-managed ids. In other words, overwrite/upsert is not keyed by customer_code, unique_identifier, or any other business key unless you first resolve that business key back to the backend id.

The first updater example below is intentionally insert-only for that reason.

The customer updater is the simplest case:

class CustomersUpdater(SimpleTableUpdater):
    SIMPLE_TABLE_SCHEMA = CustomerRecord

    @staticmethod
    def build_seed_rows() -> list[dict[str, object]]:
        return [
            {"customer_code": "ACME", "name": "Acme Capital", "region": "US"},
            {"customer_code": "BETA", "name": "Beta Treasury", "region": "EU"},
        ]

    def update(self) -> tuple[list[CustomerRecord], bool]:
        return (
            [CustomerRecord.model_validate(row) for row in self.build_seed_rows()],
            False,
        )

This updater is seeding new customer rows. Those rows do not carry backend ids yet, so returning False is the correct behavior.

If you wanted an id-aware overwrite path instead, the pattern would be:

def update(self) -> tuple[list[CustomerRecord], bool]:
    seed_rows = self.build_seed_rows()
    customer_codes = [row["customer_code"] for row in seed_rows]

    existing_rows = self.execute_filter(
        CustomerRecord.filters.customer_code.in_(customer_codes),
        limit=len(customer_codes),
    )
    customer_id_by_code = {
        row.customer_code: row.id
        for row in existing_rows
        if row.id is not None
    }

    missing_rows = [
        row
        for row in seed_rows
        if row["customer_code"] not in customer_id_by_code
    ]
    if missing_rows:
        self.insert_records(
            [CustomerRecord.model_validate(row) for row in missing_rows]
        )
        existing_rows = self.execute_filter(
            CustomerRecord.filters.customer_code.in_(customer_codes),
            limit=len(customer_codes),
        )
        customer_id_by_code = {
            row.customer_code: row.id
            for row in existing_rows
            if row.id is not None
        }

    return (
        [
            CustomerRecord.model_validate(
                {"id": customer_id_by_code[row["customer_code"]], **row}
            )
            for row in seed_rows
        ],
        True,
    )

The balance updater depends on the customer updater:

class CustomerBalancesUpdater(SimpleTableUpdater):
    SIMPLE_TABLE_SCHEMA = CustomerBalanceRecord

    def __init__(
        self,
        configuration: CustomerBalancesUpdaterConfiguration,
        *,
        customers_updater: CustomersUpdater | None = None,
        hash_namespace: str | None = None,
        test_node: bool = False,
    ):
        self.customers_updater = customers_updater or CustomersUpdater(
            configuration=CustomersUpdaterConfiguration()
        )
        super().__init__(
            configuration=configuration,
            hash_namespace=hash_namespace,
            test_node=test_node,
        )

Its update() method first reads customer rows back from the backend, because those rows now carry the backend-assigned id values needed for customer_id.

That is the key architectural shift relative to the old user-defined-id model:

  • parent rows are inserted first
  • parent rows are queried back
  • downstream tables use the returned ids as foreign keys

The debt updater follows the same pattern, but one step lower:

  • it queries customers to map customer_code -> customer.id
  • it queries balances to map (customer_id, as_of_date) -> balance.id
  • then it builds CustomerDebtRecord rows with balance_id

5. Build the Remote Tables

The example creates the three updaters and builds their remote objects:

customers_updater = CustomersUpdater(configuration=CustomersUpdaterConfiguration())
balances_updater = CustomerBalancesUpdater(
    configuration=CustomerBalancesUpdaterConfiguration(),
    customers_updater=customers_updater,
)
debts_updater = CustomerDebtsUpdater(
    configuration=CustomerDebtsUpdaterConfiguration(),
    balances_updater=balances_updater,
)

customers_updater.verify_and_build_remote_objects()
balances_updater.verify_and_build_remote_objects()
debts_updater.verify_and_build_remote_objects()

At that point each updater has:

  • its own update_hash
  • its own storage_hash
  • a resolved backend storage object

The example prints all three so you can see the distinction clearly.

6. Insert Rows

The example inserts customer rows first:

for record in customer_seed:
    customers_updater.insert(record)

Then it reads them back:

inserted_customers = customers_updater.execute_filter(
    CustomerRecord.filters.customer_code.in_(customer_codes)
)

Those returned rows now include backend ids. The example builds:

customer_id_by_code = {
    record.customer_code: record.id
    for record in inserted_customers
    if record.id is not None
}

and uses that mapping to create the balance rows.

This is the intended pattern whenever downstream rows depend on backend-assigned ids.

7. Read Rows Back With Typed Filters

Simple-table filters are built from the table class and executed through the updater.

Example:

us_customers = customers_updater.execute_filter(
    CustomerRecord.filters.region.eq("US")
)

More specific reads can combine conditions:

from mainsequence.tdag.simple_tables import and_

balance_rows = balances_updater.execute_filter(
    and_(
        CustomerBalanceRecord.filters.customer_id.in_(list(customer_id_by_code.values())),
        CustomerBalanceRecord.filters.as_of_date.eq(datetime.date(2026, 3, 22)),
    )
)

For a deeper filtering guide, see docs/knowledge/simple_tables/filtering.md.

8. Sparse Upsert Existing Rows

After the balance rows are read back, the example uses the returned backend ids for a sparse upsert:

SimpleTableStorage.upsert_records_into_table(
    simple_table_id=balances_updater.data_node_storage.id,
    records=[
        {"id": inserted_balance_by_customer_id[customer_id_by_code["ACME"]].id, "balance_usd": 150_000.0},
        {"id": inserted_balance_by_customer_id[customer_id_by_code["BETA"]].id, "balance_usd": 91_250.0},
    ],
)

This is exactly why backend-managed ids are workable:

  • insert first
  • read back ids
  • upsert by id later

In other words, you do not need a user-declared primary key in the schema to mutate rows after they exist. You do, however, need the backend id before using overwrite/upsert payloads.

9. Delete Rows

Delete also uses the backend id returned by earlier reads:

for record in inserted_debts:
    debts_updater.delete(record.id)

The example then re-queries the table and asserts that the demo rows are gone.

That same pattern is repeated for:

  • debt rows
  • balance rows
  • customer rows

10. Build Join Filters

Single-table filters are enough for many workflows, but the example also shows how to build join filters.

First resolve the runtime tables:

resolved_balances = balances_updater.resolve_table()
resolved_customers = customers_updater.resolve_table()
resolved_debts = debts_updater.resolve_table()

Then create join aliases:

customer_join = resolved_balances.join("customer", target=resolved_customers)
debt_join = resolved_balances.join("debt", target=resolved_debts)

Then build a combined filter:

from mainsequence.tdag.simple_tables import and_, or_

complex_filter = and_(
    CustomerBalanceRecord.filters.as_of_date.between(
        datetime.date(2026, 3, 1),
        datetime.date(2026, 3, 31),
    ),
    or_(
        customer_join.filters.region.eq("US"),
        debt_join.filters.debt_type.eq("margin"),
    ),
    or_(
        debt_join.filters.debt_usd.gte(10_000.0),
        CustomerBalanceRecord.filters.balance_usd.gte(140_000.0),
    ),
)

rows = balances_updater.execute_filter(
    complex_filter,
    joins=[customer_join, debt_join],
)

Important details:

  • customer and debt are aliases
  • the updater still executes the read
  • the returned rows are still CustomerBalanceRecord rows
  • joins are used to constrain which base rows match

11. Run the End-to-End Example

The example file exposes:

build_test_simple_tables()

That helper does all of the following in one repeatable run:

  • builds the three remote tables
  • clears previous demo rows
  • inserts customers, balances, and debts
  • reads them back and verifies the inserted data
  • performs a sparse upsert on balances
  • reads the balances back and verifies the changes
  • deletes the demo rows
  • reads again and verifies that the rows are gone
  • prints example request payloads for typed filters and join filters

If your .env is configured for a live backend, you can run:

.venv/bin/python examples/data_nodes/simple_tables.py

12. Inspect the Tables From the CLI

The CLI gives you storage-level visibility into the resulting simple tables.

List simple-table storages:

mainsequence simple_table list

Inspect one storage:

mainsequence simple_table detail 41

Delete one storage:

mainsequence simple_table delete 41

These commands operate on SimpleTableStorage, not on individual row ids.

They are useful for:

  • inspecting the compiled schema
  • reviewing column and relation metadata
  • checking the build configuration used for the storage

13. How SimpleTable Fits With DataNode

SimpleTable is not a separate world. It is a different table model inside the same dependency system.

That means you can:

  • use simple tables for master/reference data
  • use data nodes for time-series analytics
  • wire them into the same application or project workflow

Typical pattern:

  • simple tables hold stable entities and relationships
  • data nodes compute normalized analytical outputs that depend on them

Or the reverse:

  • a data pipeline computes something
  • a simple table exposes the application-facing records built from that output

14. Next Steps

After this tutorial, the most useful follow-ups are: