Adding a New Dagster Code Location¶
This guide covers everything needed to add a new isolated Dagster code location to the rev-sci-dagster monorepo.
Canonical example: src/underwriting_ml/ — refer to it for any pattern not covered here.
Quick Start¶
If you have Claude Code available, the scaffold skill handles all of this:
/scaffold-code-location network_analytics analytics 4002
Arguments: <package_name> <short_name> <port>. The skill creates all files, modifies all configs, and runs verification.
For manual setup, follow the phases below.
Naming Conventions¶
| Concept | Convention | Example |
|---|---|---|
| Package directory | src/<snake_case>/ |
src/network_analytics/ |
| Python package name | snake_case |
network_analytics |
| PyPI/pyproject name | kebab-case |
network-analytics |
| Docker image | dagster-<kebab-case> |
dagster-network-analytics |
| Dockerfile | Dockerfile.<dotted.name> |
Dockerfile.network.analytics |
| ClickHouse database | snake_case (matches package) |
network_analytics |
| Workspace key | Short name for CLI convenience | analytics |
| Docker Swarm service | snake_case (matches package) |
network_analytics |
| Build tag variable | DAGSTER_BUILD_TAG_<UPPER> |
DAGSTER_BUILD_TAG_ANALYTICS |
Port Allocation¶
Each code location runs a gRPC code server on a unique port.
| Port | Location |
|---|---|
| 3000 | Dagster webserver |
| 4000 | vanguard_wholesale |
| 4001 | underwriting_ml |
| 4002+ | Next available |
Ports appear in: workspace.yaml, Dockerfile.<name> (EXPOSE + CMD), compose.yml.
Phase 1: Create the Package¶
Create the directory tree under src/<name>/:
src/<name>/
__init__.py # empty package marker
pyproject.toml # declares only this location's deps
definitions.py # Dagster definitions entry point
pyproject.toml (copy from src/underwriting_ml/pyproject.toml, adjust):
[project]
name = "<kebab-name>"
version = "0.1.0"
requires-python = ">=3.13,<3.14"
description = "<one-line description>"
dependencies = [
"dagster>=1.12,<1.13",
"dagster-polars>=0.27.6",
# location-specific deps only
"shared",
]
[tool.uv.sources]
shared = { workspace = true }
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["."]
definitions.py (minimal starting point):
import os
from pathlib import Path
import dagster as dg
from dagster_polars import PolarsParquetIOManager
import <name>.assets
def get_data_dir() -> str:
if Path("/.dockerenv").exists():
return "/opt/dagster/shared-data"
else:
project_root = Path(__file__).parent.parent.parent
return str(project_root / "tmp" / "dagster_data")
@dg.definitions
def defs() -> dg.Definitions:
data_dir = get_data_dir()
return dg.Definitions(
resources={
"polars_io_manager": PolarsParquetIOManager(
base_dir=f"{data_dir}/<name>"
),
},
assets=dg.load_assets_from_package_module(
<name>.assets,
),
)
Create src/<name>/assets/__init__.py as an empty file to start with.
Phase 2: Register in Root Workspace¶
Edit pyproject.toml (root):
-
Add to
[tool.uv.workspace].members:members = ["src/shared", "src/vanguard_wholesale", "src/underwriting_ml", "src/<name>"] -
Add to
[project].dependencies:dependencies = [ "vanguard-wholesale", "underwriting-ml", "<kebab-name>", ] -
Add to
[tool.uv.sources]:<kebab-name> = { workspace = true } -
Add to
[tool.hatch.build.targets.wheel].packages:packages = ["devtools", "src/shared", "src/vanguard_wholesale", "src/underwriting_ml", "src/<name>"]
Then regenerate the lockfile:
uv lock
uv sync --group dev
Phase 3: ClickHouse Integration¶
If the code location produces Parquet assets queryable via ClickHouse, create the clickhouse subpackage. Otherwise skip this phase.
src/<name>/clickhouse/
__init__.py # exports get_bootstrap_config()
asset_registry.py # DB_NAME + list[AssetRegistryEntry]
summary_views.py # get_summary_views_ddl(db_map)
See src/underwriting_ml/clickhouse/ for the complete pattern. The __init__.py must expose a get_bootstrap_config(config) function that returns a CodeLocationConfig.
Bootstrap auto-discovers code locations that have src/<name>/clickhouse/__init__.py — no registration needed.
Also add COPY lines to Dockerfile.bootstrap:
COPY src/<name>/__init__.py ./src/<name>/__init__.py
COPY src/<name>/clickhouse ./src/<name>/clickhouse
And add the pyproject.toml stub for UV workspace validation:
COPY src/<name>/pyproject.toml ./src/<name>/pyproject.toml
Phase 4: Create the Dockerfile¶
Create Dockerfile.<dotted.name> (copy from Dockerfile.underwriting.ml, adjust):
Key things to change:
- --package <kebab-name> in both uv sync commands
- COPY src/<name> paths
- ENV CODE_LOCATION_NAME=<name>
- EXPOSE <port> to match your allocated port
- CMD port: -p <port>
- CMD definitions path: -f src/<name>/definitions.py
The entrypoint.code-location.sh is shared across all locations — it reads CODE_LOCATION_NAME to find the right env file at runtime.
Phase 5: Docker Compose and Swarm¶
compose.yml — add a service block (copy pattern from underwriting_ml service):
- Build with dockerfile: ./Dockerfile.<dotted.name>
- Set container_name and image
- Add to depends_on for webserver and daemon services
docker-stack.yml — add a service block:
- Image: ${DAGSTER_PLAN_REGISTRY:-registry.plan.com:5000}/dagster-<kebab-name>:${DAGSTER_BUILD_TAG_<UPPER>:-latest}
- Set DAGSTER_CURRENT_IMAGE environment variable
- Mount dagster_mount_storage volume
- Attach to dagster network
Phase 6: Production Workspace and CI¶
workspace.yaml — add a gRPC server entry:
- grpc_server:
host: <name>
port: <port>
location_name: "<name>"
docker-bake.hcl — add a build target:
target "dagster-<kebab-name>" {
context = "."
dockerfile = "Dockerfile.<dotted.name>"
tags = [
"registry.plan.com:5000/dagster-<kebab-name>:latest"
]
platforms = ["linux/amd64"]
push = true
}
Add to the default group targets list.
bitbucket-pipelines.yml:
- Add &build-push-<name> anchor (copy pattern from existing)
- Add &deploy-<name> anchor
- Add to changeset.includePaths conditions
- Add to deploy-code-location-stage steps
Phase 7: Environment Templates¶
Create environments/<name>.env.template with all required environment variables (empty values). This serves as documentation for what secrets need configuring.
At deploy time, env files are NFS-mounted at /opt/dagster/shared-data/environments/<name>.env and sourced by the shared entrypoint.
Bitbucket Deployment Variables¶
After merging, configure these repository variables in Bitbucket:
| Variable | Value | Scope |
|---|---|---|
SERVICE_IMAGE_NAME_<UPPER> |
dagster-<kebab-name> |
Repository |
DAGSTER_BUILD_TAG_<UPPER> |
(set by CI pipeline) | Pipeline |
What's Auto-Discovered¶
After Phase A refactoring (DAG-23), these files no longer need manual registration:
| Tool | Discovery mechanism |
|---|---|
uv run check-defs |
Scans src/*/definitions.py |
uv run dev-local |
Scans .dagster_home/workspace.*.yaml, auto-generates on first use |
bootstrap-clickhouse |
Scans src/*/clickhouse/__init__.py |
You can use --only <name> with the full package name immediately. Short-name aliases in devtools/check_defs.py and devtools/dev_local.py are optional CLI convenience.
Verification¶
Run these after completing all phases:
uv lock # regenerate lockfile
uv sync --group dev # install everything locally
uv tree --package <kebab-name> # confirm dependency isolation
uv run check-defs # definitions validate
uv run check-defs --only <name> # single-location validation
uv run dev-local --only <name> # loads in Dagster UI at localhost:3001
uv run dev-local --all # all locations load together
uv run pytest # tests pass
bootstrap-clickhouse --locations <name> # ClickHouse schema created (if applicable)
Common Pitfalls¶
- Package name vs. PyPI name: The directory and import name use
snake_case(underwriting_ml), butpyproject.tomlnameuseskebab-case(underwriting-ml). The--packageflag in Dockerfiles anduvcommands uses the kebab-case name. - Dockerfile pyproject stubs: UV validates the full workspace lockfile structure during
uv sync --frozen. Every workspace member'spyproject.tomlmust be COPY'd into the builder stage, even if you're only installing one package. - Port collisions: Check
workspace.yamlfor the next free port before choosing one. The port must match inworkspace.yaml,Dockerfile(EXPOSE + CMD), andcompose.yml. - ClickHouse modules must be dagster-free: The
clickhouse/subpackage is copied into the lightweight bootstrap image which doesn't have dagster installed. Only importpolars,shared, and stdlib. - IO manager base_dir: Each location writes to
{data_dir}/<name>. Don't share a base_dir between locations — it causes asset key collisions.
Architecture Notes¶
- UV workspaces isolate dependencies per code location. Each member's
pyproject.tomldeclares only what it needs. Docker builds use--packageto install only the target member's dependency tree. src/shared/is infrastructure, not a code location. It provides ClickHouse config, schema mapper, view generator, and bootstrap. All code locations depend on it.- Per-location ClickHouse databases: Bootstrap creates one database per location. Cross-domain JOINs use fully-qualified
database.viewsyntax viadb_map. - Local dev via
dg devruns code locations in-process. Production runs each in a separate gRPC container discovered viaworkspace.yaml. - Multi-repo: Splitting code locations into separate repositories is feasible in theory but the current platform conventions (shared lockfile, workspace-aware Dockerfiles, Bitbucket pipeline anchors) make it a deliberate engineering project, not a config change.