Ad CTR Prediction: Serving & Trade-Offs

Explore the design of ad click-through rate prediction systems focusing on serving latency, feature retrieval strategies, and feedback loops. Understand how to meet strict production time constraints while balancing stale and fresh data, preventing training-serving skew, and addressing label delay issues. Learn the trade-offs and engineering decisions to build scalable, reliable CTR prediction pipelines critical for real-world ML system design.

We'll cover the following...

Feature serving strategies
The feedback loop pipeline
- From impression to labeled example
  - The label delay problem
L4, L5, and Staff+ answer depth
Synthesis and design principles

With the model validated for accuracy, calibration, and fairness, the engineering challenge pivots sharply from offline metrics to production infrastructure. Every time a user loads a webpage or scrolls a feed, an ad auction fires. That auction has roughly 100 milliseconds, end to end, to retrieve candidate ads, assemble features, score each candidate with a CTR prediction model, run the auction logic, and return a winning ad. Miss that window, and the ad slot goes unfilled. Revenue evaporates.

This latency budget is not theoretical. Systems at Google, Meta, and TikTok operate under this constraint at billions of requests per day. The budget breaks down approximately as follows: around 10ms for ad retrieval and candidate selection, roughly 5ms for feature assembly, another 5ms for model inference, and the remaining time consumed by network hops, auction ranking, and ad rendering. Each millisecond matters because the scoring service must evaluate hundreds of candidate ads in parallel within that envelope.

The core interview question this lesson prepares you for is direct. “How would you design the serving path for a CTR prediction model that must return scores for hundreds of ad candidates within this budget?” The answer spans three pillars: feature serving strategies that keep retrieval under a millisecond, a feedback loop pipeline that continuously improves the model without introducing bias, and the trade-off reasoning that separates L4 answers from Staff+ answers.

The following diagram illustrates the full serving path from user request to ad response.

With the latency budget and architecture established, the next step is to examine the component that most frequently becomes the bottleneck: feature retrieval.

Feature serving strategies

A single CTR prediction request can require 50 to 200 features spanning user browsing history, ad metadata, advertiser attributes, and real-time context like device type or time of day. Fetching each feature individually from a database would blow through the latency budget before the model even runs. Production systems solve this with a layered approach that matches each feature’s freshness requirement to the right retrieval mechanism.

Pre-computation for slow-changing features

Batch pipelines built with tools ...

1.The Interview Framework and Communication

2.Problem Formulation and Requirements

3.Data Strategy: Collection, Pipelines, and Features

4.Model Design and Architecture Selection

5.Evaluation: Offline, Online, and Fairness

6.Serving, Deployment, and MLOps

7.Case Study: Video Recommendation System

8.Case Study: Social Feed Ranking System

9.Case Study: Ad Click-Through Rate Prediction System

Mock Interview

10.Case Study: Semantic Search Engine

11.Case Study: Content Moderation System

Mock Interview

12.Case Study: Object Detection System

Mock Interview

13.Case Study: Visual Search System

Mock Interview

14.Case Study: Fraud Detection System

Mock Interview

15.Case Study: RAG-Based Enterprise Knowledge Assistant

16.Case Study: LLM-Powered Code Generation Tool

Ad CTR Prediction: Serving & Trade-Offs

Feature serving strategies

Pre-computation for slow-changing features