Skip to content

Orchestration lessons

The Dagster definitions live at tutorial/serving/src/serving/definitions.py. Open the Dagster UI:

Terminal window
uv run --package tutorial-serving dagster dev -m serving.definitions

The asset graph is the most important artefact in this module. It traces from raw publisher landings, through dbt staging, to mart and editorial models, end-to-end. Every dbt model becomes a Dagster asset via dagster-dbt, so the same Python source-of-truth that defines the dbt project drives the orchestration.

Three publisher-level raw assets sit at the head of the graph:

  • raw/ebnerd
  • raw/adressa
  • raw/mind

Each is the dlt source from the ingestion module wrapped as a Dagster asset. The staging models depend on their publisher’s raw asset, so materialising any downstream asset pulls the required ingestion into the run automatically.

stg_unified_impressions carries two Dagster asset checks declared in definitions.py:

  • row_count — fails when the unified view drops below the expected threshold (so a publisher dropping out is loud rather than silent).
  • freshness — fails when the most recent event in the view is older than the freshness budget for the test run.

These aren’t dbt tests in disguise. They are Dagster-level checks that show up red in the UI when they fail, and they gate downstream materialisation.

Two named schedules / sensors are wired:

  • A daily schedule targeting the whole asset graph. Useful for re-running everything cleanly on a fresh local copy.
  • A raw_download_sensor watching data/raw-download/ — when a new source file lands, the sensor triggers the matching publisher’s raw ingest asset.

Both are visible in the Dagster UI and runnable from there.

The orchestration code is exercised by test_dagster_orchestration.py, which instantiates the code location and verifies the expected assets, checks, schedule, and sensor are wired correctly. The test runs without a Dagster UI — orchestration is real, not a screenshot.

Orchestration is the gluing layer; nothing downstream consumes it directly. Continue to lakehouse / storage for the substrate the orchestration writes to, or to modeling for what reads from the orchestrated pipeline’s output.