Skip to content

Adding a New Dagster Code Location

This guide covers everything needed to add a new isolated Dagster code location to the rev-sci-dagster monorepo.

Canonical example: src/underwriting_ml/ — refer to it for any pattern not covered here.

Quick Start

If you have Claude Code available, the scaffold skill handles all of this:

/scaffold-code-location network_analytics analytics 4002

Arguments: <package_name> <short_name> <port>. The skill creates all files, modifies all configs, and runs verification.

For manual setup, follow the phases below.

Naming Conventions

Concept Convention Example
Package directory src/<snake_case>/ src/network_analytics/
Python package name snake_case network_analytics
PyPI/pyproject name kebab-case network-analytics
Docker image dagster-<kebab-case> dagster-network-analytics
Dockerfile Dockerfile.<dotted.name> Dockerfile.network.analytics
ClickHouse database snake_case (matches package) network_analytics
Workspace key Short name for CLI convenience analytics
Docker Swarm service snake_case (matches package) network_analytics
Build tag variable DAGSTER_BUILD_TAG_<UPPER> DAGSTER_BUILD_TAG_ANALYTICS

Port Allocation

Each code location runs a gRPC code server on a unique port.

Port Location
3000 Dagster webserver
4000 vanguard_wholesale
4001 underwriting_ml
4002+ Next available

Ports appear in: workspace.yaml, Dockerfile.<name> (EXPOSE + CMD), compose.yml.

Phase 1: Create the Package

Create the directory tree under src/<name>/:

src/<name>/
  __init__.py              # empty package marker
  pyproject.toml           # declares only this location's deps
  definitions.py           # Dagster definitions entry point

pyproject.toml (copy from src/underwriting_ml/pyproject.toml, adjust):

[project]
name = "<kebab-name>"
version = "0.1.0"
requires-python = ">=3.13,<3.14"
description = "<one-line description>"
dependencies = [
    "dagster>=1.12,<1.13",
    "dagster-polars>=0.27.6",
    # location-specific deps only
    "shared",
]

[tool.uv.sources]
shared = { workspace = true }

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.build.targets.wheel]
packages = ["."]

definitions.py (minimal starting point):

import os
from pathlib import Path

import dagster as dg
from dagster_polars import PolarsParquetIOManager

import <name>.assets


def get_data_dir() -> str:
    if Path("/.dockerenv").exists():
        return "/opt/dagster/shared-data"
    else:
        project_root = Path(__file__).parent.parent.parent
        return str(project_root / "tmp" / "dagster_data")


@dg.definitions
def defs() -> dg.Definitions:
    data_dir = get_data_dir()

    return dg.Definitions(
        resources={
            "polars_io_manager": PolarsParquetIOManager(
                base_dir=f"{data_dir}/<name>"
            ),
        },
        assets=dg.load_assets_from_package_module(
            <name>.assets,
        ),
    )

Create src/<name>/assets/__init__.py as an empty file to start with.

Phase 2: Register in Root Workspace

Edit pyproject.toml (root):

  1. Add to [tool.uv.workspace].members:

    members = ["src/shared", "src/vanguard_wholesale", "src/underwriting_ml", "src/<name>"]
    

  2. Add to [project].dependencies:

    dependencies = [
        "vanguard-wholesale",
        "underwriting-ml",
        "<kebab-name>",
    ]
    

  3. Add to [tool.uv.sources]:

    <kebab-name> = { workspace = true }
    

  4. Add to [tool.hatch.build.targets.wheel].packages:

    packages = ["devtools", "src/shared", "src/vanguard_wholesale", "src/underwriting_ml", "src/<name>"]
    

Then regenerate the lockfile:

uv lock
uv sync --group dev

Phase 3: ClickHouse Integration

If the code location produces Parquet assets queryable via ClickHouse, create the clickhouse subpackage. Otherwise skip this phase.

src/<name>/clickhouse/
  __init__.py              # exports get_bootstrap_config()
  asset_registry.py        # DB_NAME + list[AssetRegistryEntry]
  summary_views.py         # get_summary_views_ddl(db_map)

See src/underwriting_ml/clickhouse/ for the complete pattern. The __init__.py must expose a get_bootstrap_config(config) function that returns a CodeLocationConfig.

Bootstrap auto-discovers code locations that have src/<name>/clickhouse/__init__.py — no registration needed.

Also add COPY lines to Dockerfile.bootstrap:

COPY src/<name>/__init__.py ./src/<name>/__init__.py
COPY src/<name>/clickhouse ./src/<name>/clickhouse

And add the pyproject.toml stub for UV workspace validation:

COPY src/<name>/pyproject.toml ./src/<name>/pyproject.toml

Phase 4: Create the Dockerfile

Create Dockerfile.<dotted.name> (copy from Dockerfile.underwriting.ml, adjust):

Key things to change: - --package <kebab-name> in both uv sync commands - COPY src/<name> paths - ENV CODE_LOCATION_NAME=<name> - EXPOSE <port> to match your allocated port - CMD port: -p <port> - CMD definitions path: -f src/<name>/definitions.py

The entrypoint.code-location.sh is shared across all locations — it reads CODE_LOCATION_NAME to find the right env file at runtime.

Phase 5: Docker Compose and Swarm

compose.yml — add a service block (copy pattern from underwriting_ml service): - Build with dockerfile: ./Dockerfile.<dotted.name> - Set container_name and image - Add to depends_on for webserver and daemon services

docker-stack.yml — add a service block: - Image: ${DAGSTER_PLAN_REGISTRY:-registry.plan.com:5000}/dagster-<kebab-name>:${DAGSTER_BUILD_TAG_<UPPER>:-latest} - Set DAGSTER_CURRENT_IMAGE environment variable - Mount dagster_mount_storage volume - Attach to dagster network

Phase 6: Production Workspace and CI

workspace.yaml — add a gRPC server entry:

  - grpc_server:
      host: <name>
      port: <port>
      location_name: "<name>"

docker-bake.hcl — add a build target:

target "dagster-<kebab-name>" {
  context = "."
  dockerfile = "Dockerfile.<dotted.name>"
  tags = [
    "registry.plan.com:5000/dagster-<kebab-name>:latest"
  ]
  platforms = ["linux/amd64"]
  push = true
}

Add to the default group targets list.

bitbucket-pipelines.yml: - Add &build-push-<name> anchor (copy pattern from existing) - Add &deploy-<name> anchor - Add to changeset.includePaths conditions - Add to deploy-code-location-stage steps

Phase 7: Environment Templates

Create environments/<name>.env.template with all required environment variables (empty values). This serves as documentation for what secrets need configuring.

At deploy time, env files are NFS-mounted at /opt/dagster/shared-data/environments/<name>.env and sourced by the shared entrypoint.

Bitbucket Deployment Variables

After merging, configure these repository variables in Bitbucket:

Variable Value Scope
SERVICE_IMAGE_NAME_<UPPER> dagster-<kebab-name> Repository
DAGSTER_BUILD_TAG_<UPPER> (set by CI pipeline) Pipeline

What's Auto-Discovered

After Phase A refactoring (DAG-23), these files no longer need manual registration:

Tool Discovery mechanism
uv run check-defs Scans src/*/definitions.py
uv run dev-local Scans .dagster_home/workspace.*.yaml, auto-generates on first use
bootstrap-clickhouse Scans src/*/clickhouse/__init__.py

You can use --only <name> with the full package name immediately. Short-name aliases in devtools/check_defs.py and devtools/dev_local.py are optional CLI convenience.

Verification

Run these after completing all phases:

uv lock                                    # regenerate lockfile
uv sync --group dev                        # install everything locally
uv tree --package <kebab-name>             # confirm dependency isolation
uv run check-defs                          # definitions validate
uv run check-defs --only <name>            # single-location validation
uv run dev-local --only <name>             # loads in Dagster UI at localhost:3001
uv run dev-local --all                     # all locations load together
uv run pytest                              # tests pass
bootstrap-clickhouse --locations <name>    # ClickHouse schema created (if applicable)

Common Pitfalls

  • Package name vs. PyPI name: The directory and import name use snake_case (underwriting_ml), but pyproject.toml name uses kebab-case (underwriting-ml). The --package flag in Dockerfiles and uv commands uses the kebab-case name.
  • Dockerfile pyproject stubs: UV validates the full workspace lockfile structure during uv sync --frozen. Every workspace member's pyproject.toml must be COPY'd into the builder stage, even if you're only installing one package.
  • Port collisions: Check workspace.yaml for the next free port before choosing one. The port must match in workspace.yaml, Dockerfile (EXPOSE + CMD), and compose.yml.
  • ClickHouse modules must be dagster-free: The clickhouse/ subpackage is copied into the lightweight bootstrap image which doesn't have dagster installed. Only import polars, shared, and stdlib.
  • IO manager base_dir: Each location writes to {data_dir}/<name>. Don't share a base_dir between locations — it causes asset key collisions.

Architecture Notes

  • UV workspaces isolate dependencies per code location. Each member's pyproject.toml declares only what it needs. Docker builds use --package to install only the target member's dependency tree.
  • src/shared/ is infrastructure, not a code location. It provides ClickHouse config, schema mapper, view generator, and bootstrap. All code locations depend on it.
  • Per-location ClickHouse databases: Bootstrap creates one database per location. Cross-domain JOINs use fully-qualified database.view syntax via db_map.
  • Local dev via dg dev runs code locations in-process. Production runs each in a separate gRPC container discovered via workspace.yaml.
  • Multi-repo: Splitting code locations into separate repositories is feasible in theory but the current platform conventions (shared lockfile, workspace-aware Dockerfiles, Bitbucket pipeline anchors) make it a deliberate engineering project, not a config change.