Getting Started with Main Sequence Part 1
This tutorial walks you through creating a project, setting it up on your Windows machine, and building your first data nodes. The goal is to make each step clear and actionable while preserving all the examples and screenshots from your original guide.
you can see the final repository here: https://github.com/mainsequence-sdk/TutorialProject. Also you can follow the tutorial on video format here: https://www.youtube.com/watch?v=4e_5UmvX27Q&list=PLbqT9fcxsYzoC67baydxNSSVxRm1U_aVV
1. Create a Project
Log in to Main Sequence. You'll land on the Projects page. Projects help you organize work, data, and compute. Let's create the first one: choose Create New Project and name it Tutorial Project.


After a few seconds, your new project should appear with a checkmark indicating it's initialized. Click the project to open it.

On the Project Details page you'll see:
- A green status indicator confirming the project was set up correctly.
- The repository and branch (e.g., tutorial-project/main) and the latest commit.
- Two Jobs representing background processes—no action needed for now.

2. Work on the Project Locally
We'll use Visual Studio Code for the tutorial. If you don't have it, download it from the official site. Also make sure you have Python 3.11 or later installed or download it from the official site and follow the installation instructions.
Setting up via VS Code Extension (Recommended)
The recommended way to work with Main Sequence projects is via the VS Code extension (it just makes things smoother), so first install the extension:
-
Open the Extensions view in VS Code
-
macOS: Press
Cmd+Shift+X - Windows/Linux: Press
Ctrl+Shift+X -
Or click the Extensions icon in the Activity Bar on the left side of the window.
-
Search for the extension
In the Extensions search box, type Main Sequence and press Enter.

If you don’t find the extension, you can install it directly from the marketplace:
Main Sequence VS Code Extension – VS Code Marketplace
Once the extension is installed, log in to your account. You should see your project in the Projects view.
Click Set up project locally and wait a few seconds for the project to be mapped locally.

After a few seconds, refresh the Projects view and you should see your project mapped locally (in blue).
Open the project’s context menu and select Open Folder. This will open a VS Code window with your project mapped locally.
You should see now your project in the current project panel

Setting up via CLI
Open PowerShell terminal (Windows) or your preferred terminal (macOS/Linux) and enter the next commands.
First, install the Main Sequence Python package in your environment:
pip install mainsequence
With the package installed, you can use the CLI from your machine:
mainsequence --help
# or if your system does not allow automatic additions to the path
python -m mainsequence --help

Now log in via the CLI:
mainsequence login [USER_NAME]
You should see a list of your projects:
Projects:
ID Project Data Source Class Status Local Path
-- ------- ----------- ----- ------ ----- ----
60 TutorialProject Default DB timescale_db AVAILABLE — —
The Path column is empty because the project isn't mapped locally yet. Use the project command to see your options:
mainsequence project --help
Output:
Usage: mainsequence project [OPTIONS] COMMAND [ARGS]...
Project commands
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ list List projects with Local status and path. │
│ open Open the local folder in the OS file manager. │
│ delete-local Unlink the mapped folder, optionally delete it. │
│ open-signed-terminal Open a terminal window in the project directory with ssh-agent started and the repo's key added. │
│ set-up-locally Set up project locally. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Map the project to your machine and list again to confirm the mapping:
mainsequence project set-up-locally [PROJECT_ID]
mainsequence project list
Output:
Windows:
ID Project Data Source Class Status Local Path
-- ------- ----------- ----- ------ ----- ----
60 Tutorial Project Default DB timescale_db AVAILABLE Local C:\Users\YourName\mainsequence\my_organization\projects\tutorial-project
macOS/Linux:
ID Project Data Source Class Status Local Path
-- ------- ----------- ----- ------ ----- ----
60 Tutorial Project Default DB timescale_db AVAILABLE Local /home/user/mainsequence/my_organization/projects/tutorial-project
Once mapped, you'll see the project under your mainsequence folder structure (for example, a src directory with a data_nodes module, plus typical files like pyproject.toml, README.md, and requirements.txt).
Open your project in VS Code and select your Python environment (the tutorial was written using Python 3.11.9). We'll use uv to manage dependencies and dev workflow.
Open PowerShell terminal in VS Code (Ctrl+`), create a virtual environment, then activate it and install uv:
Windows PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install uv
macOS/Linux:
python -m venv .venv
source .venv/bin/activate
pip install uv
Select the Python interpreter from your new virtual environment in VS Code (Ctrl+Shift+P > Python: Select Interpreter).
Sync dependencies from requirements.txt:
uv sync
From now on, add libraries with:
uv add library_name
If your project depends on environment variables, verify they're set (for example, VFB_PROJECT_PATH). You can check environment variables with:
Windows PowerShell:
$env:VFB_PROJECT_PATH
macOS/Linux:
echo $VFB_PROJECT_PATH
To set an environment variable temporarily (for current session):
Windows PowerShell:
$env:VFB_PROJECT_PATH = "C:\Users\YourName\mainsequence\my_organization\projects\tutorial-project"
macOS/Linux:
export VFB_PROJECT_PATH="/home/user/mainsequence/my_organization/projects/tutorial-project"
Set up a local environment
Your new project comes already pre-configured with the latest version of the Main Sequence SDK and with other helpful libraries. Set as default or by your Engineering team.
To quickly set up the environment pres cmd+shift+p and type Tasks: Run Task and select Set environment.
The next step is Visual Studio havnt done this automatically is setup the new environment as the standard
type agains cmd+shift+p and type Python: Select Interpreter and select USe Python from python.defaultinterpreterPath

3. Build Your First Data Nodes
Key concepts: data DAGs, DataNode, dependencies, update_hash, and storage_hash.
Main Sequence encourages you to model workflows as data DAGs (directed acyclic graphs), composing your work into small steps called data nodes, each performing a single transformation.
Create a new file at src\data_nodes\example_nodes.py (Windows) or src/data_nodes/example_nodes.py (macOS/Linux), and define your first node, DailyRandomNumber, by subclassing DataNode.
You can find the complete code for the subsequent data nodes in the examples folder.
from typing import Dict, Union
import pandas as pd
from mainsequence.tdag.data_nodes import DataNode, APIDataNode
import mainsequence.client as msc
import numpy as np
from pydantic import BaseModel, Field
class VolatilityConfig(BaseModel):
center: float = Field(
...,
title="Standard Deviation",
description="Standard deviation of the normal distribution (must be > 0).",
examples=[0.1, 1.0, 2.5],
gt=0, # constraint: strictly positive
le=1e6, # example upper bound (optional)
multiple_of=0.0001, # example precision step (optional)
)
skew: bool
class RandomDataNodeConfig(BaseModel):
mean: float = Field(..., ignore_from_storage_hash=False, title="Mean",
description="Mean for the random normal distribution generator")
std: VolatilityConfig = Field(VolatilityConfig(center=1, skew=True), ignore_from_storage_hash=True,
title="Vol Config",
description="Vol Configuration")
class DailyRandomNumber(DataNode):
"""
Example Data Node that generates one random number every day
"""
def __init__(self, node_configuration: RandomDataNodeConfig, *args, **kwargs):
"""
:param node_configuration: Configuration containing mean and std parameters
:param kwargs: Additional keyword arguments
"""
self.node_configuration = node_configuration
self.mean = node_configuration.mean
self.std = node_configuration.std
super().__init__(*args, **kwargs)
def get_table_metadata(self) -> msc.TableMetaData:
TS_ID = f"example_random_number_{self.mean}_{self.std}"
meta = msc.TableMetaData(identifier=TS_ID,
description="Example Data Node")
return meta
def update(self) -> pd.DataFrame:
"""Draw daily samples from N(mean, std) since last run (UTC days)."""
today = pd.Timestamp.now("UTC").normalize()
last = self.update_statistics.max_time_index_value
if last is not None and last >= today:
return pd.DataFrame()
return pd.DataFrame(
{"random_number": [np.random.normal(self.mean, self.std.center)]},
index=pd.DatetimeIndex([today], name="time_index", tz="UTC"),
)
def dependencies(self) -> Dict[str, Union["DataNode", "APIDataNode"]]:
"""
This node does not depend on any other data nodes.
"""
return {}
DataNode Recipe
To create a data node we must follow the same recipe every time:
- Extend the base class
mainsequence.tdag.DataNode - Implement the constructor method
__init__() - Implement the
dependencies()method - Implement the
update()method
The update() Method
The update method has only one requirement: it should return a pandas.DataFrame with the following characteristics:
- Update method always needs to return a
pd.DataFrame()
Data Frame Structure Requirements
- The first index level must always be of type
datetime.datetime(timezone="UTC"). - All column names in the DataFrame must be lowercase and no more than 63 characters long.
- Column data types are only allowed to be
float,int, orstr. Any date information must be transformed tointorfloat. - The DataFrame must not be empty. If there is no new data to return, an empty
pd.DataFrame()must be returned. - A MultiIndex DataFrame is only allowed when the first index level is of type
datetime.datetime(timezone="UTC"), the second index level is of typestr, and its name isunique_identifier. - For a single-index DataFrame, the index must not contain duplicate values. For a MultiIndex DataFrame, there must be no duplicate combinations of
(time_index, unique_identifier). - The name of the first index level must always be
time_index, and it is strongly recommended that it represents the observation time of the time series. For example, if the DataFrame stores time bars,time_indexshould represent the moment the bar is observed, not when the bar started. - If dates are stored in columns, they must be represented as timestamps.
Next, create scripts\random_number_launcher.py to run the node:
from src.data_nodes.example_nodes import DailyRandomNumber
def main():
daily_node = DailyRandomNumber(node_configuration=RandomDataNodeConfig(mean=0.0))
daily_node.run()
if __name__ == "__main__":
main()
To run and debug in VS Code, you can configure a launch file at .vscode\launch.json:
you can also just as copilot or your ai assitant
Build me a debug launcher called "Debug random_number_launcher"
for my file src/random_number_launcher
Windows:
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug random_number_launcher",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}\\scripts\\random_number_launcher.py",
"console": "integratedTerminal",
"env": {
"PYTHONPATH": "${workspaceFolder}"
},
"python": "${workspaceFolder}\\.venv\\Scripts\\python.exe"
}
]
}
macOS/Linux:
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug random_number_launcher",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/scripts/random_number_launcher.py",
"console": "integratedTerminal",
"env": {
"PYTHONPATH": "${workspaceFolder}"
},
"python": "${workspaceFolder}/.venv/bin/python"
}
]
}
Back to your random_number_launcher.py, and at the top right corner of VS Code you will see Run Python File dropdown, click on the Python Debugger: Debug using launch.json option and finally select the debug configuration you just created.
This will execute the configuration. Then open:
https://main-sequence.app/dynamic-table-metadatas/
Search for dailyrandom. You should see your data node and its table.

Click the storage hash, then in the table's context menu (the … button), select Explore Table Data to confirm that your node persisted data.

Add a Dependent Data Node
Now extend the workflow with a node that depends on DailyRandomNumber. Add the following to src\data_nodes\example_nodes.py:
class DailyRandomAddition(DataNode):
def __init__(self, mean: float, std: float, *args, **kwargs):
self.mean = mean
self.std = std
self.daily_random_number_data_node = DailyRandomNumber(
*args, node_configuration=RandomDataNodeConfig(mean=0.0), **kwargs
)
super().__init__(*args, **kwargs)
def dependencies(self):
return {"number_generator": self.daily_random_number_data_node}
def update(self) -> pd.DataFrame:
"""Draw daily samples from N(mean, std) since last run (UTC days)."""
today = pd.Timestamp.now("UTC").normalize()
last = self.update_statistics.max_time_index_value
if last is not None and last >= today:
return pd.DataFrame()
random_number = np.random.normal(self.mean, self.std)
dependency_noise = self.daily_random_number_data_node.get_df_between_dates(
start_date=today, great_or_equal=True
).iloc[0]["random_number"]
self.logger.info(f"random_number={random_number} dependency_noise={dependency_noise}")
return pd.DataFrame(
{"random_number": [random_number + dependency_noise]},
index=pd.DatetimeIndex([today], name="time_index", tz="UTC"),
)
This simply defines a dependent node (DailyRandomAddition) that references and uses the output of DailyRandomNumber.
Create a launcher at scripts\random_daily_addition_launcher.py:
from src.data_nodes.example_nodes import DailyRandomAddition
daily_node = DailyRandomAddition(mean=0.0, std=1.0)
daily_node.run(debug_mode=True, force_update=True)
Now to run this launcher, add a new debug configuration to your .vscode/launch.json in configurations list (or duplicate the existing config and change the program path and name).
(Windows):
{
"name": "Debug random_daily_addition_launcher",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}\\scripts\\random_daily_addition_launcher.py",
"console": "integratedTerminal",
"env": {
"PYTHONPATH": "${workspaceFolder}"
},
"python": "${workspaceFolder}\\.venv\\Scripts\\python.exe"
}
{
"name": "Debug random_daily_addition_launcher",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/scripts/random_daily_addition_launcher.py",
"console": "integratedTerminal",
"env": {
"PYTHONPATH": "${workspaceFolder}"
},
"python": "${workspaceFolder}/.venv/bin/python"
}
random_daily_addition_launcher.py file and run the configuration from the Run/Debug dropdown at the top-right, choose "Debug random_daily_addition_launcher” and then choose new configuration with "Debug random_daily_addition_launcher" name. After it runs, return to the Dynamic Table Metadatas page to see the new table:
https://main-sequence.app/dynamic-table-metadatas/?search=dailyrandom&storage_hash=&identifier=
Open the dailyrandomaddition_XXXXX table to explore it. For a visual of the dependency structure, click the update process arrow and then the update hash.

You'll see the dependency graph for this workflow:

4. update_hash vs. storage_hash
A DataNode does two critical things in Main Sequence:
- Controls the update process for your data (sequential or time-series based).
- Persists data in the Data Engine (think of it as a managed database—no need to handle schemas, sessions, etc.).
To support both, each DataNode uses two identifiers:
update_hash: a unique hash derived from the combination of arguments that define an update process. In the random-number example, that might includemeanandstd.storage_hash: an identifier for where data is stored. It can ignore specific arguments so multiple update processes can write to the same table.
Why do this? Sometimes you want to store data from different processes in a single table. While the simple example here is contrived, this pattern becomes very useful with multi-index tables.
Now update your daily random number launcher to run two update processes with different volatility configurations but the same storage.
To do this, modify scripts\random_number_launcher.py to be as follows:
from src.data_nodes.example_nodes import DailyRandomNumber, RandomDataNodeConfig, VolatilityConfig
low_vol = VolatilityConfig(center=0.5, skew=False)
high_vol = VolatilityConfig(center=2.0, skew=True)
daily_node_low = DailyRandomNumber(node_configuration=RandomDataNodeConfig(mean=0.0, std=low_vol))
daily_node_high = DailyRandomNumber(
node_configuration=RandomDataNodeConfig(mean=0.0, std=high_vol)
)
daily_node_low.run(debug_mode=True, force_update=True)
daily_node_high.run(debug_mode=True, force_update=True)
Here we create two DailyRandomNumber nodes with different std (Volatility) configurations but the same mean. Since we set ignore_from_storage_hash=True for the std field in RandomDataNodeConfig, both nodes will write to the same underlying table.
Run the updated launcher in VS Code as before. After it runs, return to the Dynamic Table Metadatas page to see the table for DailyRandomNumber.
You'll see that you have a single table with three different update processes (you just added two new processes by running the modified launcher):

Congratulations! You've built your first Data Nodes in Main Sequence. In the next part of the tutorial, we'll explore scheduling and automating these nodes and more.