Social Feed Ranking: Evaluation & Experimentation

Explore comprehensive evaluation methods for social feed ranking systems to prevent metric cannibalization and ensure fairness. Understand guardrail metrics, interleaving for fast ranking comparison, creator equity measurement, and fairness evaluations. This lesson equips you to design layered experiment frameworks that address engagement, creator diversity, and bias risks critical for senior ML system design interviews.

We'll cover the following...

Metric cannibalization in A/B testing
- Understanding the failure mode
- Designing guardrail metrics
Interleaving for rapid ranking comparison
- How team-draft interleaving works
  - Correcting for position bias
  - Limitations and the two-stage workflow
Creator equity as an experiment outcome
Fairness evaluation as a design constraint
- The evaluation protocol
Conclusion

With a multi-task MMoE architecture, constrained scalarization, and a long-term value head in place, the ranking model is ready to score content. But how do you prove it actually works without introducing hidden regressions? This is the question that separates competent ML engineers from Staff+ candidates in system design interviews. Facebook learned this lesson the hard way when its engagement-optimized feed eroded meaningful social interactions, forcing an architectural overhaul in 2018. The failure was not in the model itself but in the evaluation framework that failed to catch slow-moving damage to platform health.

This lesson covers the four pillars of a robust evaluation strategy for social feed ranking. First, guardrail metrics that prevent metric cannibalization during A/B testing. Second, interleaving as a fast screening method for ranking comparison. Third, creator equity as a core experiment outcome. Fourth, fairness evaluation as a hard design constraint. Mastering these pillars gives you a layered evaluation narrative that interviewers expect at senior levels.

Metric cannibalization in A/B testing

Understanding the failure mode

Metric cannibalization occurs when optimizing for a primary metric such as clicks or session time systematically degrades a secondary metric like unfollow rate, 28-day retention, or content diversity. Standard A/B tests are particularly vulnerable to this because they typically run for one to two weeks, long enough to capture short-term engagement lifts but far too short to observe the slow erosion of connection quality that unfolds over weeks.

Consider a concrete scenario. A new ranking model increases click-through rate by 3%, and the experiment looks like a clear win after two weeks. However, over four weeks the creator mute rate climbs by 8%. Users are clicking more on sensational content but quietly disconnecting from creators they used to value. The short experiment window never surfaces this net negative.

Attention: Metric cannibalization is the single most common evaluation failure in social feed ranking. If your interview answer only mentions engagement metrics, you are leaving a critical gap.

Designing guardrail metrics

Guardrail metrics solve this problem by acting as hard constraints that must not regress beyond a ...

1.The Interview Framework and Communication

2.Problem Formulation and Requirements

3.Data Strategy: Collection, Pipelines, and Features

4.Model Design and Architecture Selection

5.Evaluation: Offline, Online, and Fairness

6.Serving, Deployment, and MLOps

7.Case Study: Video Recommendation System

8.Case Study: Social Feed Ranking System

9.Case Study: Ad Click-Through Rate Prediction System

Mock Interview

10.Case Study: Semantic Search Engine

11.Case Study: Content Moderation System

Mock Interview

12.Case Study: Object Detection System

Mock Interview

13.Case Study: Visual Search System

Mock Interview

14.Case Study: Fraud Detection System

Mock Interview

15.Case Study: RAG-Based Enterprise Knowledge Assistant

16.Case Study: LLM-Powered Code Generation Tool

Social Feed Ranking: Evaluation & Experimentation

Metric cannibalization in A/B testing

Understanding the failure mode

Designing guardrail metrics