Orchestration lessons

The Dagster definitions live at tutorial/serving/src/serving/definitions.py. Open the Dagster UI:

uv run --package tutorial-serving dagster dev -m serving.definitions

L01 — The asset graph

The asset graph is the most important artefact in this module. It traces from raw publisher landings, through dbt staging, to mart and editorial models, end-to-end. Every dbt model becomes a Dagster asset via dagster-dbt, so the same Python source-of-truth that defines the dbt project drives the orchestration.

Three publisher-level raw assets sit at the head of the graph:

raw/ebnerd
raw/adressa
raw/mind

Each is the dlt source from the ingestion module wrapped as a Dagster asset. The staging models depend on their publisher’s raw asset, so materialising any downstream asset pulls the required ingestion into the run automatically.

L02 — Asset checks

stg_unified_impressions carries two Dagster asset checks declared in definitions.py:

row_count — fails when the unified view drops below the expected threshold (so a publisher dropping out is loud rather than silent).
freshness — fails when the most recent event in the view is older than the freshness budget for the test run.

These aren’t dbt tests in disguise. They are Dagster-level checks that show up red in the UI when they fail, and they gate downstream materialisation.

L03 — Schedules and sensors

Two named schedules / sensors are wired:

A daily schedule targeting the whole asset graph. Useful for re-running everything cleanly on a fresh local copy.
A raw_download_sensor watching data/raw-download/ — when a new source file lands, the sensor triggers the matching publisher’s raw ingest asset.

Both are visible in the Dagster UI and runnable from there.

L04 — Tests

The orchestration code is exercised by test_dagster_orchestration.py, which instantiates the code location and verifies the expected assets, checks, schedule, and sensor are wired correctly. The test runs without a Dagster UI — orchestration is real, not a screenshot.

After this module

Orchestration is the gluing layer; nothing downstream consumes it directly. Continue to lakehouse / storage for the substrate the orchestration writes to, or to modeling for what reads from the orchestrated pipeline’s output.