Assistants lessons
The assistants lessons walk the change-explainer agent end-to-end, plus the
five-scorer Evalite rubric that gates it in CI. All code lives under
editor/
because ADR-0020
puts editorial assistants in the TypeScript half — behind editor
endpoints, not as freelance AI features bolted onto the recommender.
L01 — Evalite bootstrap
Section titled “L01 — Evalite bootstrap”evalite, vitest, and autoevals are wired into the editor workspace
via
editor/evalite.config.ts.
Run the full eval:
pnpm --filter editor evalRun the Evalite UI:
pnpm --filter editor eval:devL02 — The change-explainer agent
Section titled “L02 — The change-explainer agent”The agent lives at
editor/src/agents/change-explainer.ts
with the prompt at
editor/src/agents/prompt.ts.
It takes:
{ before: RankedList, after: RankedList, constraint_diff, platform_facts }and returns:
{ headline, moved_down[], moved_up[], summary }The LLM call is wrapped with traceAISDKModel so traces appear in the
Evalite UI for every fixture row.
L03 — Programmatic fixtures
Section titled “L03 — Programmatic fixtures”editor/src/fixtures/generator.ts
generates fixtures by calling the FastAPI /preview endpoint twice (once
for the “before” config, once for “after”) for each of N×M (user,
constraint-diff) combinations. Default 40 fixtures. Seedable — same
seed = same fixtures, every time. CI reproducibility is the discipline
this generator enforces.
The
editor/fixtures/curated/README.md
file is the explicit extension point for hand-pinned cases from a domain
expert (per the feedback-showcase-humility memory). It’s empty in v1 —
the tutorial does not claim editorial expertise.
L04 — Four deterministic scorers
Section titled “L04 — Four deterministic scorers”editor/src/scorers/
holds the deterministic rubric:
| Scorer | What it checks | Source |
|---|---|---|
| grounded-entities | Every article_id / source / topic cited in the output appears in before or after. Score = 1 - hallucinated/total | grounded-entities.ts |
| reason-validity | Each move-up / move-down reason is verifiable against platform_facts | reason-validity.ts |
| constraint-coverage | The changed constraint is named in the summary. Binary. | constraint-coverage.ts |
| length | summary ≤ 60 words; per-item reasons ≤ 25 words | length.ts |
Each scorer is a pure function with its own Vitest unit tests.
L05 — LLM-as-judge: editorial register
Section titled “L05 — LLM-as-judge: editorial register”editor/src/scorers/editorial-register.ts
is a fifth scorer using createScorer from Evalite. Judges whether the
summary reads like a Danish news editor would write — concise, specific,
no AI-slop, no hedging.
Gated: only runs when grounded-entities ≥ 0.95. Otherwise returns
n/a and is excluded from the row’s average. Avoids judging gibberish
for register, and saves model spend on rows that are already failing.
L06 — The Evalite suite + CI threshold
Section titled “L06 — The Evalite suite + CI threshold”editor/evals/change-explainer.eval.ts
wires fixtures + agent + scorers. The CI threshold:
- Average across all scorers ≥ 0.85
grounded-entities ≥ 0.95on every row (one hallucination = suite failure)
Threshold enforcement lives in
editor/scripts/eval-summary.ts
— a wrapper that exits non-zero when the rubric fails, so a regression
shows up as a red CI build.
L07 — Editor integration
Section titled “L07 — Editor integration”When an editor moves a slider, two HTMX partials fire in parallel: the
recommendations partial AND the change-explainer partial. The partial
route is
src/routes/change-explainer-partial.ts.
The editor sees the new list and the editorial reason for the change side
by side.
The dbt/Evalite split
Section titled “The dbt/Evalite split”The deterministic constraint metrics (diversity, recency, source mix, sensitive-topic exposure) stay in dbt because they are reproducible checks over materialised platform outputs. The non-deterministic assistant behaviours (faithfulness, register, intent-translation accuracy) live in Evalite. The split is ADR-0020.
Planned future suites
Section titled “Planned future suites”The same pattern applies to two more editorial assistants, deferred to a later phase:
constraint-translator— turns editor intent (“make today’s front more varied”) into specific constraint weight changes.alternative-suggester— proposes alternative slates for a given context.
Both are explicitly named in ADR-0020 as not-yet-built so a reader knows they aren’t lost work.