Search⌘ K
AI Features

Content Moderation: Serving & Trade-Offs

Explore how to design a content moderation system that handles millions of posts per minute using a three-tier serving architecture. Understand routing logic as a machine learning decision, balancing automation with human review. Learn to implement feedback loops that convert human decisions into training data while addressing selection bias and operational trade-offs, preparing you for advanced ML system design interviews.

The previous lesson established four evaluation pillars: cost-sensitive thresholds, slice-based evaluation, counterfactual fairness, and queue prioritization as ranking. Now the focus shifts to how the moderation system actually serves predictions in production and feeds decisions back into model improvement. Consider the core interview question that ties this all together: How would you design the serving architecture for a content moderation system that handles millions of posts per minute while balancing automation speed against human judgment accuracy?

The answer, used at scale by platforms like YouTube, Facebook, and TikTok, follows a three-tier serving pattern. Tier 1 handles automated real-time filtering. Tier 2 applies ML-driven escalation for uncertain cases. Tier 3 routes high-severity or ambiguous content to human reviewers. This lesson walks through the architecture, frames the routing logic as its own ML problem, designs the feedback loop that turns human decisions into training labels, and closes with a leveled answer comparison for interview preparation.

The following diagram captures the full three-tier architecture and the feedback loop that connects human decisions back to model retraining.

Three-tier content moderation architecture with automated classification, ensemble rescoring, human review, and feedback-driven retraining.
Three-tier content moderation architecture with automated classification, ensemble rescoring, human review, and feedback-driven retraining.

Routing logic as an ML decision

Why routing is not a set of hardcoded rules

A common misconception is that routing between tiers relies on simple if-else thresholds. In production systems, the routing policy is itself a lightweight ML model. The Tier 1 classifier outputs a calibrated confidence scoreA probability estimate that reflects the true likelihood of the predicted class, meaning a score of 0.8 should correspond to an 80% actual violation rate. for each violation category. The routing policy consumes these scores alongside contextual signals, such as content virality velocity, author trust score, and content modality, to decide whether to auto-act, escalate to Tier 2, or pass the content through.

This routing model is trained on historical outcomes, specifically cases where auto-action was correct vs. cases where human reviewers overturned automated ...