Foundations lessons
L01 — News events in DuckDB
Section titled “L01 — News events in DuckDB”The file:
tutorial/foundations/lesson-01-news-events.sql.
Run it directly:
duckdb < tutorial/foundations/lesson-01-news-events.sqlWhat runs
Section titled “What runs”The lesson is one SQL file. Reading top to bottom:
- Drop + create the
news_eventstable from a literalVALUESblock: three users, six articles, fourteen events, two event types (impression,click), five topics. - Create
article_metrics— a view computing impressions, clicks, and CTR per article. Note thenullif(...,0)guard against zero-impression divides. - Create
user_topic_affinity— a view computing impressions, clicks, and topic-level CTR per (user, topic). This is the simplest cold-start signal in the tutorial. - Print three labelled sections of the underlying tables for visual inspection: raw events, article metrics, user-topic affinity.
- Compute a starter candidate score for every (user, article) pair the
user has not already seen:
Articles already shown to the user are excluded viastarter_score = 0.7 · user_topic_ctr + 0.3 · article_ctr
LEFT JOIN ... WHERE seen.article_id IS NULL. - End with a checkpoint question — answer it from the output: Which user has the strongest sports signal, and which unseen article would the starter score rank highest for that user?
What the lesson teaches that later modules rely on
Section titled “What the lesson teaches that later modules rely on”- The two-event vocabulary (
impressionandclick). Every downstream metric in evaluation reduces to these two atoms. - Per-article CTR with a divide-by-zero guard. That guard pattern reappears every time a metric divides by impression counts.
- The candidate-set shape. Already-seen articles are excluded before ranking, not as a separate filter step. The modeling module keeps the same boundary when it swaps the starter score for cosine similarity over sentence embeddings.
- Checkpoint questions as the lesson’s correctness gate. A lesson is not “run; it printed something.” A lesson is “run; the printed output answers the checkpoint.” That discipline scales to every later module.
What the lesson deliberately does not do
Section titled “What the lesson deliberately does not do”- No Python. No dependencies beyond DuckDB.
- No real data — the
VALUESblock is small enough that you can read it. - No persistence beyond the views (
CREATE VIEW, notCREATE TABLE AS). Reset by re-running.
After this lesson
Section titled “After this lesson”Run ingestion when you are ready to replace the
fourteen rows with millions. The data shape stays exactly the same; the
column names stay the same; only the source changes.