Part 3.1: Working With Simple Tables

Quick Summary

In this tutorial, you will:

model non-time-series data with SimpleTable
build a small three-table relation: customers, balances, and debts
attach those tables to SimpleTableUpdater classes
insert, read, sparse-upsert, and delete rows
run typed filters, including join filters
inspect the resulting backend tables from the CLI

The concrete example for this tutorial lives in examples/data_nodes/simple_tables.py.

Before going further, it is important to frame the feature correctly.

As the name suggests, SimpleTable is designed to create small ORM-like tables inside a project, with a heavily reduced database configuration compared with a full application ORM. The spirit of MainSequence is fast iteration inside a normalized working environment: define a schema, attach it to an updater, persist rows, and query them back without having to design a full database layer.

That makes SimpleTable a good fit for:

reference data
small relational record sets inside a project
application-facing tables that support lightweight insert, read, update, and delete workflows

It is not meant to replace a full ORM or a full database application model. If you need more complex transactional behavior, richer relational modeling, custom migration workflows, or a broader application database layer, you should not rely on SimpleTable for that.

1. When To Use `SimpleTable`

Use SimpleTable when your data is row-oriented but not naturally normalized as a time series.

Good examples:

customers
counterparties
portfolios
mapping tables
balance snapshots where time is just another column

If your data is fundamentally organized around time_index and unique_identifier, you usually want a DataNode table instead.

SimpleTable still fits into the same graph-oriented MainSequence workflow:

it has a schema
it has an updater
it has update_hash and storage_hash
it can depend on other updaters

2. Important: `id` Is Backend-Managed

Important

Every simple-table row has an id, but you must not declare that field yourself.

Why:

the backend assigns it
it must not participate in schema hashing
allowing users to define it would create collisions

Correct:

class CustomerRecord(SimpleTable):
    customer_code: Annotated[str, Index(unique=True), Ops(filter=True, order=True)] = Field(...)

Wrong:

class CustomerRecord(SimpleTable):
    id: int
    customer_code: Annotated[str, Index(unique=True), Ops(filter=True, order=True)] = Field(...)

The normal lifecycle is:

insert rows without id
read them back
use the returned id for sparse upserts or deletes

This is especially important for foreign-key workflows. Downstream rows usually need parent ids that were assigned by the backend during earlier inserts.

Warning

Index(unique=True) helps with lookup and constraints, but it is not the key used for overwrite/upsert writes.

If update() returns (records, True), overwrite is keyed by the backend-managed row id.

That means:

Index(unique=True) does not make a field the write key for overwrite
returning (records, True) requires id-populated rows
if your business key is something like customer_code, first read existing rows, map customer_code -> id, insert any missing rows, then return rows that include those ids

3. Define the Table Schemas

The tutorial example uses three tables:

CustomerRecord
CustomerBalanceRecord
CustomerDebtRecord

import datetime
from typing import Annotated

from pydantic import Field

from mainsequence.tdag.simple_tables import ForeignKey, Index, Ops, SimpleTable


class CustomerRecord(SimpleTable):
    customer_code: Annotated[str, Index(unique=True), Ops(filter=True, order=True)] = Field(
        ...,
        title="Customer Code",
        description="Stable customer identifier.",
    )
    name: Annotated[str, Ops(filter=True, order=True)] = Field(
        ...,
        title="Name",
        description="Human-readable customer name.",
    )
    region: Annotated[str, Index(), Ops(filter=True)] = Field(
        ...,
        title="Region",
        description="Commercial region for the customer.",
    )


class CustomerBalanceRecord(SimpleTable):
    customer_id: Annotated[
        int,
        ForeignKey("customers", on_delete="cascade"),
        Index(),
        Ops(filter=True),
    ] = Field(
        ...,
        title="Customer Id",
        description="Foreign key to the customer table.",
    )
    as_of_date: Annotated[datetime.date, Ops(filter=True, order=True)] = Field(
        ...,
        title="As Of Date",
        description="Balance snapshot date.",
    )
    balance_usd: Annotated[float, Ops(filter=True, order=True)] = Field(
        ...,
        title="Balance USD",
        description="Customer balance in USD.",
    )


class CustomerDebtRecord(SimpleTable):
    balance_id: Annotated[
        int,
        ForeignKey("balances", on_delete="cascade"),
        Index(),
        Ops(filter=True),
    ] = Field(
        ...,
        title="Balance Id",
        description="Foreign key to the balance snapshot row.",
    )
    debt_type: Annotated[str, Ops(filter=True, order=True)] = Field(
        ...,
        title="Debt Type",
        description="Debt category associated with the balance row.",
    )
    debt_usd: Annotated[float, Ops(filter=True, order=True)] = Field(
        ...,
        title="Debt USD",
        description="Debt amount in USD.",
    )

What each schema feature is doing:

Annotated[...] keeps the field type and metadata together
Index(...) declares useful read paths
Ops(...) declares which fields are filterable or orderable
ForeignKey(...) declares the relation to a dependency updater key

For a deeper explanation of the schema DSL, see docs/knowledge/simple_tables/simple_table.md.

4. Attach Each Table To an Updater

SimpleTable defines the schema. SimpleTableUpdater owns the real backend table and the actual read/write workflow.

Insert-only vs overwrite/upsert in `SimpleTableUpdater.update()`

SimpleTableUpdater.update() can return either:

records
(records, overwrite)

If you only return records, the updater behaves like overwrite=False.

Use overwrite=False when you are inserting new rows that do not have backend ids yet. That is the safe default for seed-style examples.

Use overwrite=True only when the returned rows already include backend-managed ids. In other words, overwrite/upsert is not keyed by customer_code, unique_identifier, or any other business key unless you first resolve that business key back to the backend id.

The first updater example below is intentionally insert-only for that reason.

The customer updater is the simplest case:

class CustomersUpdater(SimpleTableUpdater):
    SIMPLE_TABLE_SCHEMA = CustomerRecord

    @staticmethod
    def build_seed_rows() -> list[dict[str, object]]:
        return [
            {"customer_code": "ACME", "name": "Acme Capital", "region": "US"},
            {"customer_code": "BETA", "name": "Beta Treasury", "region": "EU"},
        ]

    def update(self) -> tuple[list[CustomerRecord], bool]:
        return (
            [CustomerRecord.model_validate(row) for row in self.build_seed_rows()],
            False,
        )

This updater is seeding new customer rows. Those rows do not carry backend ids yet, so returning False is the correct behavior.

If you wanted an id-aware overwrite path instead, the pattern would be:

def update(self) -> tuple[list[CustomerRecord], bool]:
    seed_rows = self.build_seed_rows()
    customer_codes = [row["customer_code"] for row in seed_rows]

    existing_rows = self.execute_filter(
        CustomerRecord.filters.customer_code.in_(customer_codes),
        limit=len(customer_codes),
    )
    customer_id_by_code = {
        row.customer_code: row.id
        for row in existing_rows
        if row.id is not None
    }

    missing_rows = [
        row
        for row in seed_rows
        if row["customer_code"] not in customer_id_by_code
    ]
    if missing_rows:
        self.insert_records(
            [CustomerRecord.model_validate(row) for row in missing_rows]
        )
        existing_rows = self.execute_filter(
            CustomerRecord.filters.customer_code.in_(customer_codes),
            limit=len(customer_codes),
        )
        customer_id_by_code = {
            row.customer_code: row.id
            for row in existing_rows
            if row.id is not None
        }

    return (
        [
            CustomerRecord.model_validate(
                {"id": customer_id_by_code[row["customer_code"]], **row}
            )
            for row in seed_rows
        ],
        True,
    )

The balance updater depends on the customer updater:

class CustomerBalancesUpdater(SimpleTableUpdater):
    SIMPLE_TABLE_SCHEMA = CustomerBalanceRecord

    def __init__(
        self,
        configuration: CustomerBalancesUpdaterConfiguration,
        *,
        customers_updater: CustomersUpdater | None = None,
        hash_namespace: str | None = None,
        test_node: bool = False,
    ):
        self.customers_updater = customers_updater or CustomersUpdater(
            configuration=CustomersUpdaterConfiguration()
        )
        super().__init__(
            configuration=configuration,
            hash_namespace=hash_namespace,
            test_node=test_node,
        )

Its update() method first reads customer rows back from the backend, because those rows now carry the backend-assigned id values needed for customer_id.

That is the key architectural shift relative to the old user-defined-id model:

parent rows are inserted first
parent rows are queried back
downstream tables use the returned ids as foreign keys

The debt updater follows the same pattern, but one step lower:

it queries customers to map customer_code -> customer.id
it queries balances to map (customer_id, as_of_date) -> balance.id
then it builds CustomerDebtRecord rows with balance_id

5. Build the Remote Tables

The example creates the three updaters and builds their remote objects:

customers_updater = CustomersUpdater(configuration=CustomersUpdaterConfiguration())
balances_updater = CustomerBalancesUpdater(
    configuration=CustomerBalancesUpdaterConfiguration(),
    customers_updater=customers_updater,
)
debts_updater = CustomerDebtsUpdater(
    configuration=CustomerDebtsUpdaterConfiguration(),
    balances_updater=balances_updater,
)

customers_updater.verify_and_build_remote_objects()
balances_updater.verify_and_build_remote_objects()
debts_updater.verify_and_build_remote_objects()

At that point each updater has:

its own update_hash
its own storage_hash
a resolved backend storage object

The example prints all three so you can see the distinction clearly.

6. Insert Rows

The example inserts customer rows first:

for record in customer_seed:
    customers_updater.insert(record)

Then it reads them back:

inserted_customers = customers_updater.execute_filter(
    CustomerRecord.filters.customer_code.in_(customer_codes)
)

Those returned rows now include backend ids. The example builds:

customer_id_by_code = {
    record.customer_code: record.id
    for record in inserted_customers
    if record.id is not None
}

and uses that mapping to create the balance rows.

This is the intended pattern whenever downstream rows depend on backend-assigned ids.

7. Read Rows Back With Typed Filters

Simple-table filters are built from the table class and executed through the updater.

Example:

us_customers = customers_updater.execute_filter(
    CustomerRecord.filters.region.eq("US")
)

More specific reads can combine conditions:

from mainsequence.tdag.simple_tables import and_

balance_rows = balances_updater.execute_filter(
    and_(
        CustomerBalanceRecord.filters.customer_id.in_(list(customer_id_by_code.values())),
        CustomerBalanceRecord.filters.as_of_date.eq(datetime.date(2026, 3, 22)),
    )
)

For a deeper filtering guide, see docs/knowledge/simple_tables/filtering.md.

8. Sparse Upsert Existing Rows

After the balance rows are read back, the example uses the returned backend ids for a sparse upsert:

SimpleTableStorage.upsert_records_into_table(
    simple_table_id=balances_updater.data_node_storage.id,
    records=[
        {"id": inserted_balance_by_customer_id[customer_id_by_code["ACME"]].id, "balance_usd": 150_000.0},
        {"id": inserted_balance_by_customer_id[customer_id_by_code["BETA"]].id, "balance_usd": 91_250.0},
    ],
)

This is exactly why backend-managed ids are workable:

insert first
read back ids
upsert by id later

In other words, you do not need a user-declared primary key in the schema to mutate rows after they exist. You do, however, need the backend id before using overwrite/upsert payloads.

9. Delete Rows

Delete also uses the backend id returned by earlier reads:

for record in inserted_debts:
    debts_updater.delete(record.id)

The example then re-queries the table and asserts that the demo rows are gone.

That same pattern is repeated for:

debt rows
balance rows
customer rows

10. Build Join Filters

Single-table filters are enough for many workflows, but the example also shows how to build join filters.

First resolve the runtime tables:

resolved_balances = balances_updater.resolve_table()
resolved_customers = customers_updater.resolve_table()
resolved_debts = debts_updater.resolve_table()

Then create join aliases:

customer_join = resolved_balances.join("customer", target=resolved_customers)
debt_join = resolved_balances.join("debt", target=resolved_debts)

Then build a combined filter:

from mainsequence.tdag.simple_tables import and_, or_

complex_filter = and_(
    CustomerBalanceRecord.filters.as_of_date.between(
        datetime.date(2026, 3, 1),
        datetime.date(2026, 3, 31),
    ),
    or_(
        customer_join.filters.region.eq("US"),
        debt_join.filters.debt_type.eq("margin"),
    ),
    or_(
        debt_join.filters.debt_usd.gte(10_000.0),
        CustomerBalanceRecord.filters.balance_usd.gte(140_000.0),
    ),
)

rows = balances_updater.execute_filter(
    complex_filter,
    joins=[customer_join, debt_join],
)

Important details:

customer and debt are aliases
the updater still executes the read
the returned rows are still CustomerBalanceRecord rows
joins are used to constrain which base rows match

11. Run the End-to-End Example

The example file exposes:

build_test_simple_tables()

That helper does all of the following in one repeatable run:

builds the three remote tables
clears previous demo rows
inserts customers, balances, and debts
reads them back and verifies the inserted data
performs a sparse upsert on balances
reads the balances back and verifies the changes
deletes the demo rows
reads again and verifies that the rows are gone
prints example request payloads for typed filters and join filters

If your .env is configured for a live backend, you can run:

.venv/bin/python examples/data_nodes/simple_tables.py

12. Inspect the Tables From the CLI

The CLI gives you storage-level visibility into the resulting simple tables.

List simple-table storages:

mainsequence simple_table list

Inspect one storage:

mainsequence simple_table detail 41

Delete one storage:

mainsequence simple_table delete 41

These commands operate on SimpleTableStorage, not on individual row ids.

They are useful for:

inspecting the compiled schema
reviewing column and relation metadata
checking the build configuration used for the storage

13. How `SimpleTable` Fits With `DataNode`

SimpleTable is not a separate world. It is a different table model inside the same dependency system.

That means you can:

use simple tables for master/reference data
use data nodes for time-series analytics
wire them into the same application or project workflow

Typical pattern:

simple tables hold stable entities and relationships
data nodes compute normalized analytical outputs that depend on them

Or the reverse:

a data pipeline computes something
a simple table exposes the application-facing records built from that output

14. Next Steps

After this tutorial, the most useful follow-ups are:

continue with docs/tutorial/create_your_first_api.md to expose these rows through a project API
read the schema guide in docs/knowledge/simple_tables/simple_table.md
read the filtering guide in docs/knowledge/simple_tables/filtering.md
run the example in examples/data_nodes/simple_tables.py
inspect the created storages with mainsequence simple_table list and detail