Platform as leverage
The editorial cost of clicks is that a newsroom can optimise itself into a narrower, faster, more reactive version of its own judgment without ever making that decision explicitly. A click-optimised recommender does not need to be malicious, careless, or technically crude to create that cost. It only needs to be successful at the narrow objective it was given. If the platform measures attention but does not also make diversity, freshness, sentiment balance, promotion, and sensitive-topic exposure visible, the editorial tradeoff still happens. It just happens in the dark.
I argue that editorial accountability is a platform-layer concern, not a model-layer concern. That is the core thesis recorded in ADR-0004, and it is the reason this tutorial keeps the recommender model deliberately simple. The point is not to prove that a small content-similarity model beats production newsroom recommenders. It will not. The point is to show that the most important editorial questions around recommender systems often sit outside the model: Who can see the tradeoffs? Who can tune them? Which constraints are hard promises rather than weighted preferences? Which surface does an editor use? Which contract does an analyst query? Which evidence makes the cost of a change visible before the change is shipped?
That distinction matters because recommender systems invite a false centre of gravity. It is natural to ask which model is cleverest. It is less natural, but more useful for an accountable media product, to ask which platform makes editorial judgment operational. A newsroom does not only need a ranked list. It needs a way to reason about what that list is doing to coverage, tone, recency, and reader exposure. It needs a way to compare a click-only configuration with a balanced configuration. It needs the current settings to be inspectable. It needs a future editor, analyst, or manager to reconstruct why the system behaved as it did. Those are platform properties.
The implementation shape follows from that position. The model produces a candidate set, and the ranker turns that candidate set into a final list under an editorial constraint configuration. In plain English: the model says “these articles look relevant to this reader”; the candidate set is the pool of articles worth considering; the ranker says “given the current editorial policy, this is the order we are willing to show.” Keeping those responsibilities separate is what gives the platform leverage. The same candidate set can be ranked one way for a click-only baseline, another way for a high-diversity configuration, and another way for a breaking-news setting with stronger freshness. The model does not have to be retrained for each editorial stance.
ADR-0007 makes the model intentionally modest: content similarity over article text, with a cold-start path when the platform has no useful read history for a user. That is enough to create plausible candidates for the teaching platform. It is not enough to claim deep recommender research. That limitation is useful. It keeps the tutorial honest and puts the architecture under pressure in the right place. If the platform argument only works when the model is impressive, then the platform is not doing much work. The stronger claim is that even a simple model becomes more useful, inspectable, and governable when the surrounding platform exposes the right contracts and controls.
The current ranker slice is small on purpose. The implemented deep module is a pure function from candidate set and configuration to ranked list, with topical diversity, recency, sentiment balance, editorial promotion, and the sensitive-topic cap now wired end to end. That means the code already demonstrates the main shape of the architecture without pretending the full constraint catalogue has landed. ADR-0010 names the enforcement model: hard rules plus weighted soft constraints. ADR-0015 gives the math. Soft constraints such as topical diversity, recency, and sentiment balance contribute weighted terms to the score. Hard rules such as editorial promotion and sensitive-topic caps are not allowed to vanish inside a weighted average. They are promises the platform must enforce as filters or forced inclusions.
That mixed enforcement model is the editorial heart of the design. An editor should be able to tune a diversity weight as a matter of judgment: during a one-topic news day, more same-topic concentration may be justified; on a quieter day, a broader front can be healthier. Recency is similar. A breaking-news period calls for a different freshness bias than a weekend feature package. Sentiment balance also belongs in the soft family because the target is editorially contextual rather than universal. The platform should expose those settings as sliders or explicit values, not bury them inside code.
Promotion and sensitivity are different. A promoted investigation has to appear because the newsroom has made a conscious editorial choice. A sensitive-topic guard has to cap exposure because the newsroom has decided that vulnerable or distressing material should not dominate a reader’s list. Those rules may still need careful design, but they should not be merely “low weight” or “high weight” hints to a score function. ADR-0010 is valuable because it refuses the elegant but irresponsible simplification that every editorial concern can be represented as one more numeric preference.
The data substrate matters because accountability without a queryable record is theatre. ADR-0016 frames DuckDB over Parquet as the local-first analytical platform, not just a convenient development database. That matters for two reasons. First, the platform can be exercised on small fixtures and then scaled to large partitioned data without changing the conceptual boundary. Second, analysts can inspect the same underlying data that serves the application. The article tables, impression views, constraint configurations, and evaluation outputs are not private internals of an app. They are the analytical contract of the platform.
The companion app contract serves a different audience. Editors and application clients should not talk directly to the analytical tables. They need stable HTTP endpoints, typed request and response shapes, and an interface that feels like an editorial tool rather than a notebook. Analysts, on the other hand, should not be forced through an HTTP layer when the natural question is SQL-shaped. The two-contract design in ADR-0006 is not an implementation detail; it is part of the leverage story. Different consumers get different surfaces over the same platform, rather than different systems with drifting definitions.
This is also why the editor interface is deliberately ordinary. Sliders, forms, previews, and saved configurations are not glamorous, but they make responsibility usable. A newsroom cannot govern a recommender through architectural intent alone. It needs a current configuration table, a preview endpoint, an audit trail, and visible effects. The editor should be able to change a diversity weight and see the recommendation list move before committing the setting. The analyst should later be able to compare that configuration against click metrics and editorial metrics. The platform is the place where those two workflows meet.
Evaluation is where the cost of clicks becomes explicit. A standard recommender report can say whether the system predicted clicks well. That is useful, but incomplete. ADR-0009 adds a second family of metrics: diversity, coverage, recency, sentiment distribution, and sensitive-topic exposure. The headline visual is a Pareto frontier, not because a chart is impressive, but because it forces the tradeoff into view. A click-only configuration can be plotted next to balanced or high-diversity configurations. If the click-only point gains a little accuracy while losing a lot of coverage, that is no longer a hidden side effect. It is an editorial choice.
The platform has to stay honest about publisher context too. EB-NeRD is a strong primary dataset for this tutorial because it comes from Ekstra Bladet, which sits inside the same broad JP/Politikens media context that motivated the work. Adressa and MIND add comparison points, but they do not magically turn a tutorial into production newsroom experience. This work does not claim production newsroom recommender experience and does not overclaim JP/Politikens Hus domain depth. It uses public Scandinavian and news-recommendation datasets to make the learning credible, then leans on the platform and product engineering strengths the role actually rewards.
That honesty is strategic, not apologetic. The JP/Politikens Hus positioning context is precisely why the model is not the star. A stronger model would not automatically produce a stronger artefact for this purpose. It might even weaken it by inviting evaluation as recommender research rather than as platform judgment. The useful signal is that the candidate can learn the media-specific problem space quickly, name the limits of the work, and build a coherent platform around those limits. A manager hiring for data science leadership should care about that ability: it is the difference between shipping a clever component and creating a system the organisation can operate.
The cross-publisher staging work reinforces the same idea. EB-NeRD, Adressa, and MIND do not share exactly the same raw shape. Article identifiers, category structures, impression records, click semantics, and sentiment fields differ. The platform absorbs those differences at the staging boundary and exposes canonical downstream views where it can, while preserving publisher identity where the distinction matters. That is not just data plumbing. It is how a platform keeps comparison honest. A sentiment metric cannot pretend Adressa and MIND have EB-NeRD’s sentiment column. A dwell-time metric cannot pretend every publisher records the same event semantics. Graceful degradation is an accountability feature.
There is a leadership argument underneath the technical one. Editorial accountability cannot depend on one specialist knowing where the ranker hides a rule. It has to be documented in ADRs, surfaced through contracts, tested at the deep-module boundary, and visible in the docs site. The ranker being pure is not only good engineering hygiene. It means the core editorial transformation can be tested without a database, a web server, or a UI. The staging models being documented is not only dbt discipline. It means an analyst can understand the analytical contract without reading Python, then inspect it in the generated data reference. The OpenAPI files are not only client scaffolding. They make the app contract falsifiable in the generated API reference.
The tutorial format matters for the same reason. The lessons are not separate from the platform; the cumulative lesson state is the platform. That makes the artefact inspectable. A reader can see the data entering through dlt, landing as Parquet, taking queryable shape through DuckDB and dbt, moving through FastAPI, and appearing in the TypeScript editor interface. But this essay is not a walkthrough of those steps. This is the position: the deeper work in accountable recommenders is making editorial judgment legible, tunable, testable, and reviewable across the platform.
The most dangerous recommender failure in a media context is not always an obviously bad recommendation. Sometimes it is a locally successful system whose broader editorial effects nobody can see clearly. A platform-as-leverage approach does not solve that risk by declaring the model responsible. It solves it by giving the organisation surfaces where responsibility can be exercised: a candidate generator with modest claims, a ranker with explicit mixed enforcement, a configuration table with versioned settings, two contracts for two audiences, and evaluations that show what click optimisation costs. That is the architecture this tutorial argues for, and the standard by which the rest of the work should be judged.