Search⌘ K
AI Features

Ad CTR Prediction: Serving & Trade-Offs

Explore the design of ad click-through rate prediction systems focusing on serving latency, feature retrieval strategies, and feedback loops. Understand how to meet strict production time constraints while balancing stale and fresh data, preventing training-serving skew, and addressing label delay issues. Learn the trade-offs and engineering decisions to build scalable, reliable CTR prediction pipelines critical for real-world ML system design.

With the model validated for accuracy, calibration, and fairness, the engineering challenge pivots sharply from offline metrics to production infrastructure. Every time a user loads a webpage or scrolls a feed, an ad auction fires. That auction has roughly 100 milliseconds, end to end, to retrieve candidate ads, assemble features, score each candidate with a CTR prediction model, run the auction logic, and return a winning ad. Miss that window, and the ad slot goes unfilled. Revenue evaporates.

This latency budget is not theoretical. Systems at Google, Meta, and TikTok operate under this constraint at billions of requests per day. The budget breaks down approximately as follows: around 10ms for ad retrieval and candidate selection, roughly 5ms for feature assembly, another 5ms for model inference, and the remaining time consumed by network hops, auction ranking, and ad rendering. Each millisecond matters because the scoring service must evaluate hundreds of candidate ads in parallel within that envelope.

The core interview question this lesson prepares you for is direct. “How would you design the serving path for a CTR prediction model that must return scores for hundreds of ad candidates within this budget?” The answer spans three pillars: feature serving strategies that keep retrieval under a millisecond, a feedback loop pipeline that continuously improves the model without introducing bias, and the trade-off reasoning that separates L4 answers from Staff+ answers.

The following diagram illustrates the full serving path from user request to ad response.

End-to-end ad serving latency budget with feature store as central hub
End-to-end ad serving latency budget with feature store as central hub

With the latency budget and architecture established, the next step is to examine the component that most frequently becomes the bottleneck: feature retrieval.

Feature serving strategies

A single CTR prediction request can require 50 to 200 features spanning user browsing history, ad metadata, advertiser attributes, and real-time context like device type or time of day. Fetching each feature individually from a database would blow through the latency budget before the model even runs. Production systems solve this with a layered approach that matches each feature’s freshness requirement to the right retrieval mechanism.

Pre-computation for slow-changing features

Batch pipelines built with tools ...