Getting Started with Main Sequence Part 1

This tutorial walks you through creating a project, setting it up on your Windows machine, and building your first data nodes. The goal is to make each step clear and actionable while preserving all the examples and screenshots from your original guide.

you can see the final repository here: https://github.com/mainsequence-sdk/TutorialProject. Also you can follow the tutorial on video format here: https://www.youtube.com/watch?v=4e_5UmvX27Q&list=PLbqT9fcxsYzoC67baydxNSSVxRm1U_aVV

1. Create a Project

Log in to Main Sequence. You'll land on the Projects page. Projects help you organize work, data, and compute. Let's create the first one: choose Create New Project and name it Tutorial Project.

After a few seconds, your new project should appear with a checkmark indicating it's initialized. Click the project to open it.

On the Project Details page you'll see: - A green status indicator confirming the project was set up correctly. - The repository and branch (e.g., tutorial-project/main) and the latest commit. - Two Jobs representing background processes—no action needed for now.

2. Work on the Project Locally

We'll use Visual Studio Code for the tutorial. If you don't have it, download it from the official site. Also make sure you have Python 3.11 or later installed or download it from the official site and follow the installation instructions.

Setting up via VS Code Extension (Recommended)

The recommended way to work with Main Sequence projects is via the VS Code extension (it just makes things smoother), so first install the extension:

Open the Extensions view in VS Code
macOS: Press Cmd + Shift + X
Windows/Linux: Press Ctrl + Shift + X
Or click the Extensions icon in the Activity Bar on the left side of the window.
Search for the extension

In the Extensions search box, type Main Sequence and press Enter.

VS Code Extensions view showing Main Sequence

If you don’t find the extension, you can install it directly from the marketplace:
Main Sequence VS Code Extension – VS Code Marketplace

Once the extension is installed, log in to your account. You should see your project in the Projects view.
Click Set up project locally and wait a few seconds for the project to be mapped locally.

Set up project locally in Main Sequence extension

After a few seconds, refresh the Projects view and you should see your project mapped locally (in blue).
Open the project’s context menu and select Open Folder. This will open a VS Code window with your project mapped locally.

You should see now your project in the current project panel

Setting up via CLI

Open PowerShell terminal (Windows) or your preferred terminal (macOS/Linux) and enter the next commands.

First, install the Main Sequence Python package in your environment:

pip install mainsequence

With the package installed, you can use the CLI from your machine:

mainsequence --help
# or if your system does not allow automatic additions to the path
python -m mainsequence --help

Now log in via the CLI:

mainsequence login [USER_NAME]

You should see a list of your projects:

Projects:
ID  Project                       Data Source  Class         Status     Local  Path                                                                  
--  -------                       -----------  -----         ------     -----  ----                                                                  
60  TutorialProject                Default DB   timescale_db  AVAILABLE  —      —

The Path column is empty because the project isn't mapped locally yet. Use the project command to see your options:

mainsequence project --help

Output:

 Usage: mainsequence project [OPTIONS] COMMAND [ARGS]...                                                                                                                                                                                      

 Project commands                                                                                                                                                                                                                             

╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                                                                                                                                                                                │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ list                   List projects with Local status and path.                                                                                                                                                                           │
│ open                   Open the local folder in the OS file manager.                                                                                                                                                                       │
│ delete-local           Unlink the mapped folder, optionally delete it.                                                                                                                                                                     │
│ open-signed-terminal   Open a terminal window in the project directory with ssh-agent started and the repo's key added.                                                                                                                    │
│ set-up-locally         Set up project locally.                                                                                                                                                                                             │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Map the project to your machine and list again to confirm the mapping:

mainsequence project set-up-locally [PROJECT_ID]

mainsequence project list

Output:

Windows:

ID  Project                       Data Source  Class         Status     Local  Path                                                                  
--  -------                       -----------  -----         ------     -----  ----                                                                  
60  Tutorial Project              Default DB   timescale_db  AVAILABLE  Local  C:\Users\YourName\mainsequence\my_organization\projects\tutorial-project

macOS/Linux:

ID  Project                       Data Source  Class         Status     Local  Path                                                                  
--  -------                       -----------  -----         ------     -----  ----                                                                  
60  Tutorial Project              Default DB   timescale_db  AVAILABLE  Local  /home/user/mainsequence/my_organization/projects/tutorial-project

Once mapped, you'll see the project under your mainsequence folder structure (for example, a src directory with a data_nodes module, plus typical files like pyproject.toml, README.md, and requirements.txt).

Open your project in VS Code and select your Python environment (the tutorial was written using Python 3.11.9). We'll use uv to manage dependencies and dev workflow.

Open PowerShell terminal in VS Code (Ctrl+`), create a virtual environment, then activate it and install uv:

Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install uv

macOS/Linux:

python -m venv .venv
source .venv/bin/activate
pip install uv

Select the Python interpreter from your new virtual environment in VS Code (Ctrl+Shift+P > Python: Select Interpreter).

Sync dependencies from requirements.txt:

uv sync

From now on, add libraries with:

uv add library_name

If your project depends on environment variables, verify they're set (for example, VFB_PROJECT_PATH). You can check environment variables with:

Windows PowerShell:

$env:VFB_PROJECT_PATH

macOS/Linux:

echo $VFB_PROJECT_PATH

To set an environment variable temporarily (for current session):

Windows PowerShell:

$env:VFB_PROJECT_PATH = "C:\Users\YourName\mainsequence\my_organization\projects\tutorial-project"

macOS/Linux:

export VFB_PROJECT_PATH="/home/user/mainsequence/my_organization/projects/tutorial-project"

Set up a local environment

Your new project comes already pre-configured with the latest version of the Main Sequence SDK and with other helpful libraries. Set as default or by your Engineering team.

To quickly set up the environment pres cmd+shift+p and type Tasks: Run Task and select Set environment. The next step is Visual Studio havnt done this automatically is setup the new environment as the standard type agains cmd+shift+p and type Python: Select Interpreter and select USe Python from python.defaultinterpreterPath

3. Build Your First Data Nodes

Key concepts: data DAGs, DataNode, dependencies, update_hash, and storage_hash.

Main Sequence encourages you to model workflows as data DAGs (directed acyclic graphs), composing your work into small steps called data nodes, each performing a single transformation.

Create a new file at src\data_nodes\example_nodes.py (Windows) or src/data_nodes/example_nodes.py (macOS/Linux), and define your first node, DailyRandomNumber, by subclassing DataNode.

You can find the complete code for the subsequent data nodes in the examples folder.

from typing import Dict, Union

import pandas as pd

from mainsequence.tdag.data_nodes import DataNode, APIDataNode
import mainsequence.client as msc
import numpy as np
from pydantic import BaseModel, Field


class VolatilityConfig(BaseModel):
    center: float = Field(
        ...,
        title="Standard Deviation",
        description="Standard deviation of the normal distribution (must be > 0).",
        examples=[0.1, 1.0, 2.5],
        gt=0,  # constraint: strictly positive
        le=1e6,  # example upper bound (optional)
        multiple_of=0.0001,  # example precision step (optional)
    )
    skew: bool


class RandomDataNodeConfig(BaseModel):
    mean: float = Field(..., ignore_from_storage_hash=False, title="Mean",
                        description="Mean for the random normal distribution generator")
    std: VolatilityConfig = Field(VolatilityConfig(center=1, skew=True), ignore_from_storage_hash=True,
                                  title="Vol Config",
                                  description="Vol Configuration")


class DailyRandomNumber(DataNode):
    """
    Example Data Node that generates one random number every day
    """

    def __init__(self, node_configuration: RandomDataNodeConfig, *args, **kwargs):
        """
        :param node_configuration: Configuration containing mean and std parameters
        :param kwargs: Additional keyword arguments
        """
        self.node_configuration = node_configuration
        self.mean = node_configuration.mean
        self.std = node_configuration.std
        super().__init__(*args, **kwargs)

    def get_table_metadata(self) -> msc.TableMetaData:
        TS_ID = f"example_random_number_{self.mean}_{self.std}"
        meta = msc.TableMetaData(identifier=TS_ID,
                                description="Example Data Node")

        return meta

    def update(self) -> pd.DataFrame:
        """Draw daily samples from N(mean, std) since last run (UTC days)."""
        today = pd.Timestamp.now("UTC").normalize()
        last = self.update_statistics.max_time_index_value
        if last is not None and last >= today:
            return pd.DataFrame()
        return pd.DataFrame(
            {"random_number": [np.random.normal(self.mean, self.std.center)]},
            index=pd.DatetimeIndex([today], name="time_index", tz="UTC"),
        )

    def dependencies(self) -> Dict[str, Union["DataNode", "APIDataNode"]]:
        """
        This node does not depend on any other data nodes.
        """
        return {}

DataNode Recipe

To create a data node we must follow the same recipe every time:

Extend the base class mainsequence.tdag.DataNode
Implement the constructor method __init__()
Implement the dependencies() method
Implement the update() method

The update() Method

The update method has only one requirement: it should return a pandas.DataFrame with the following characteristics:

Update method always needs to return a pd.DataFrame()

Data Frame Structure Requirements

The first index level must always be of type datetime.datetime(timezone="UTC").
All column names in the DataFrame must be lowercase and no more than 63 characters long.
Column data types are only allowed to be float, int, or str. Any date information must be transformed to int or float.
The DataFrame must not be empty. If there is no new data to return, an empty pd.DataFrame() must be returned.
A MultiIndex DataFrame is only allowed when the first index level is of type datetime.datetime(timezone="UTC"), the second index level is of type str, and its name is unique_identifier.
For a single-index DataFrame, the index must not contain duplicate values. For a MultiIndex DataFrame, there must be no duplicate combinations of (time_index, unique_identifier).
The name of the first index level must always be time_index, and it is strongly recommended that it represents the observation time of the time series. For example, if the DataFrame stores time bars, time_index should represent the moment the bar is observed, not when the bar started.
If dates are stored in columns, they must be represented as timestamps.

Next, create scripts\random_number_launcher.py to run the node:

from src.data_nodes.example_nodes import DailyRandomNumber

def main():
    daily_node = DailyRandomNumber(node_configuration=RandomDataNodeConfig(mean=0.0))
    daily_node.run()

if __name__ == "__main__":
    main()

To run and debug in VS Code, you can configure a launch file at .vscode\launch.json:

you can also just as copilot or your ai assitant

Build me a debug launcher called "Debug random_number_launcher" 
for my file src/random_number_launcher

Windows:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Debug random_number_launcher",
            "type": "debugpy",
            "request": "launch",
            "program": "${workspaceFolder}\\scripts\\random_number_launcher.py",
            "console": "integratedTerminal",
            "env": {
                "PYTHONPATH": "${workspaceFolder}"
            },
            "python": "${workspaceFolder}\\.venv\\Scripts\\python.exe"
        }
    ]
}

macOS/Linux:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Debug random_number_launcher",
            "type": "debugpy",
            "request": "launch",
            "program": "${workspaceFolder}/scripts/random_number_launcher.py",
            "console": "integratedTerminal",
            "env": {
                "PYTHONPATH": "${workspaceFolder}"
            },
            "python": "${workspaceFolder}/.venv/bin/python"
        }
    ]
}

Back to your random_number_launcher.py, and at the top right corner of VS Code you will see Run Python File dropdown, click on the Python Debugger: Debug using launch.json option and finally select the debug configuration you just created.

This will execute the configuration. Then open:

https://main-sequence.app/dynamic-table-metadatas/

Search for dailyrandom. You should see your data node and its table.

Click the storage hash, then in the table's context menu (the … button), select Explore Table Data to confirm that your node persisted data.

Add a Dependent Data Node

Now extend the workflow with a node that depends on DailyRandomNumber. Add the following to src\data_nodes\example_nodes.py:

class DailyRandomAddition(DataNode):
    def __init__(self, mean: float, std: float, *args, **kwargs):
        self.mean = mean
        self.std = std
        self.daily_random_number_data_node = DailyRandomNumber(
            *args, node_configuration=RandomDataNodeConfig(mean=0.0), **kwargs
        )
        super().__init__(*args, **kwargs)

    def dependencies(self):
        return {"number_generator": self.daily_random_number_data_node}

    def update(self) -> pd.DataFrame:
        """Draw daily samples from N(mean, std) since last run (UTC days)."""
        today = pd.Timestamp.now("UTC").normalize()
        last = self.update_statistics.max_time_index_value
        if last is not None and last >= today:
            return pd.DataFrame()
        random_number = np.random.normal(self.mean, self.std)
        dependency_noise = self.daily_random_number_data_node.get_df_between_dates(
            start_date=today, great_or_equal=True
        ).iloc[0]["random_number"]
        self.logger.info(f"random_number={random_number} dependency_noise={dependency_noise}")

        return pd.DataFrame(
            {"random_number": [random_number + dependency_noise]},
            index=pd.DatetimeIndex([today], name="time_index", tz="UTC"),
        )

This simply defines a dependent node (DailyRandomAddition) that references and uses the output of DailyRandomNumber.

Create a launcher at scripts\random_daily_addition_launcher.py:

from src.data_nodes.example_nodes import DailyRandomAddition


daily_node = DailyRandomAddition(mean=0.0, std=1.0)
daily_node.run(debug_mode=True, force_update=True)

Now to run this launcher, add a new debug configuration to your .vscode/launch.json in configurations list (or duplicate the existing config and change the program path and name).

(Windows):

        {
            "name": "Debug random_daily_addition_launcher",
            "type": "debugpy",
            "request": "launch",
            "program": "${workspaceFolder}\\scripts\\random_daily_addition_launcher.py",
            "console": "integratedTerminal",
            "env": {
                "PYTHONPATH": "${workspaceFolder}"
            },
            "python": "${workspaceFolder}\\.venv\\Scripts\\python.exe"
        }

(macOS/Linux):

{ 
            "name": "Debug random_daily_addition_launcher", 
            "type": "debugpy", 
            "request": "launch", 
            "program": "${workspaceFolder}/scripts/random_daily_addition_launcher.py", 
            "console": "integratedTerminal", 
            "env": { 
                "PYTHONPATH": "${workspaceFolder}" 
            }, 
            "python": "${workspaceFolder}/.venv/bin/python" 
        }

Then back to the random_daily_addition_launcher.py file and run the configuration from the Run/Debug dropdown at the top-right, choose "Debug random_daily_addition_launcher” and then choose new configuration with "Debug random_daily_addition_launcher" name. After it runs, return to the Dynamic Table Metadatas page to see the new table:

https://main-sequence.app/dynamic-table-metadatas/?search=dailyrandom&storage_hash=&identifier=

Open the dailyrandomaddition_XXXXX table to explore it. For a visual of the dependency structure, click the update process arrow and then the update hash.

You'll see the dependency graph for this workflow:

4. `update_hash` vs. `storage_hash`

A DataNode does two critical things in Main Sequence:

Controls the update process for your data (sequential or time-series based).
Persists data in the Data Engine (think of it as a managed database—no need to handle schemas, sessions, etc.).

To support both, each DataNode uses two identifiers:

update_hash: a unique hash derived from the combination of arguments that define an update process. In the random-number example, that might include mean and std.
storage_hash: an identifier for where data is stored. It can ignore specific arguments so multiple update processes can write to the same table.

Why do this? Sometimes you want to store data from different processes in a single table. While the simple example here is contrived, this pattern becomes very useful with multi-index tables.

Now update your daily random number launcher to run two update processes with different volatility configurations but the same storage.

To do this, modify scripts\random_number_launcher.py to be as follows:

from src.data_nodes.example_nodes import DailyRandomNumber, RandomDataNodeConfig, VolatilityConfig

low_vol = VolatilityConfig(center=0.5, skew=False)
high_vol = VolatilityConfig(center=2.0, skew=True)


daily_node_low = DailyRandomNumber(node_configuration=RandomDataNodeConfig(mean=0.0, std=low_vol))
daily_node_high = DailyRandomNumber(
    node_configuration=RandomDataNodeConfig(mean=0.0, std=high_vol)
)

daily_node_low.run(debug_mode=True, force_update=True)
daily_node_high.run(debug_mode=True, force_update=True)

Here we create two DailyRandomNumber nodes with different std (Volatility) configurations but the same mean. Since we set ignore_from_storage_hash=True for the std field in RandomDataNodeConfig, both nodes will write to the same underlying table.

Run the updated launcher in VS Code as before. After it runs, return to the Dynamic Table Metadatas page to see the table for DailyRandomNumber.

You'll see that you have a single table with three different update processes (you just added two new processes by running the modified launcher):

Congratulations! You've built your first Data Nodes in Main Sequence. In the next part of the tutorial, we'll explore scheduling and automating these nodes and more.

Getting Started with Main Sequence Part 1

1. Create a Project

2. Work on the Project Locally

Setting up via VS Code Extension (Recommended)

Setting up via CLI

Set up a local environment

3. Build Your First Data Nodes

DataNode Recipe

The update() Method

Data Frame Structure Requirements

Add a Dependent Data Node

4. update_hash vs. storage_hash

4. `update_hash` vs. `storage_hash`