Transformation lessons
The transformation lessons walk the dbt project at
tutorial/serving/dbt/.
Run any of these from the serving package:
uv run --package tutorial-serving dbt run --project-dir tutorial/serving/dbt --profiles-dir tutorial/serving/dbtuv run --package tutorial-serving dbt test --project-dir tutorial/serving/dbt --profiles-dir tutorial/serving/dbtuv run --package tutorial-serving dbt docs generate --project-dir tutorial/serving/dbt --profiles-dir tutorial/serving/dbtL01 — Staging models per publisher
Section titled “L01 — Staging models per publisher”dbt/models/staging/
has one SQL file per publisher per source: stg_ebnerd_*, stg_adressa_*,
stg_mind_*. Each model is a rename + cast pass over the corresponding
raw dlt table. No semantic transformation happens here on purpose —
semantics belong downstream so the staging boundary stays cheap to test.
The unified view —
stg_unified_impressions.sql
— UNIONs the three publishers with a publisher column. Everything cross-
publisher in the rest of the platform reads from here.
L02 — Editorial models
Section titled “L02 — Editorial models”dbt/models/editorial/
materialises two things that bridge editorial intent and platform data:
constraint_configurations.sql— the configuration table written to by editors via the editor interface and read by the ranker on every recommendation request. Columns match ADR-0015 exactly.article_sensitivity.py— a dbt Python model combining EB-NeRD’s sentiment scores and a small NER-driven keyword pass into a booleanis_sensitiveper article. This is the boundary where “editorial guard” becomes a queryable platform fact.
L03 — Article embeddings as a dbt model
Section titled “L03 — Article embeddings as a dbt model”dbt/models/staging/article_embeddings.py
runs the sentence-transformer once per article and writes the embedding
column into Parquet. Because it lives as a dbt Python model rather than an
ad-hoc script, the embeddings get the same lineage tracking, tests, and
docs treatment as every other column in the analytical contract.
L04 — Tests
Section titled “L04 — Tests”dbt test runs every test in schema.yml plus the custom assertion at
dbt/tests/assert_article_sensitivity_seeded_cases.sql.
The custom test pins known-sensitive seed articles to is_sensitive = true
and known-benign ones to false, so a regression in the sensitivity
heuristic fails the dbt run loudly.
L05 — Generated docs as the analyst’s reference
Section titled “L05 — Generated docs as the analyst’s reference”dbt docs generate produces the lineage graph and column-level
documentation that the docs site embeds at Data reference.
This is the “analyst” half of the two-contracts argument made operational
(ADR-0006)
— analysts don’t need to read Python to understand the schema.
After this module
Section titled “After this module”Embeddings flow into modeling, the constraint configurations and sensitivity flag flow into editorial, and orchestration wraps the whole dbt project as Dagster assets so a single materialisation pulls upstream ingest in automatically.