Evaluation lessons
The full evaluation lesson uses the checked-in holdout fixture to exercise the same ranker the serving API uses. It evaluates eight pure metrics: NDCG@10, MRR, hit-rate@10, intra-list diversity, catalog coverage, median recency, sentiment-distribution divergence, and sensitive-topic exposure.
This lesson prose is deliberately table-shaped: every metric and grid point maps back to rows in the analytical contract.
The sweep grid crosses all three soft editorial weights:
diversity_weight:0.00,0.15,0.30,0.60,1.00recency_weight:0.00,0.20,0.40,0.70,1.00sentiment_weight:0.00,0.10,0.20,0.50,1.00
This is a 5 by 5 by 5 sweep. It includes the click-only baseline and the
platform default, then writes eval_sweep_results as the analytical contract
for SQL, notebooks, and chart generation.