Ad CTR Prediction: Data Strategy & Feature Engineering
Explore effective data strategies and feature engineering for ad click-through rate prediction systems. Understand how to handle high-cardinality user and ad features with embeddings and compression techniques. Learn the real-time versus pre-computed feature split for latency-sensitive serving and perform key storage and throughput estimations to design scalable, production-ready systems.
In a MAANG system design interview, proposing a transformer-based ranking model for ad CTR prediction earns you a nod. But the follow-up question, “How do you serve user features for a billion users under 20 milliseconds?” is where most candidates stumble. The model architecture is only as good as the features it consumes, and in CTR prediction, feature engineering is the design problem that separates L4 answers from Staff-level ones.
The previous lesson established the eCPM equation (
Attention: A candidate who proposes a sophisticated model architecture but hand-waves feature design will fail at L5+ rounds. Interviewers probe how features are sourced, stored, and served under latency constraints, not just what the model looks like.
The four feature families
CTR prediction draws signal from four distinct families, each capturing a different dimension of the ad impression event. Think of it like a restaurant recommendation: you need to know the diner’s preferences (user), the dish being offered (ad), the time and setting of the meal (context), and whether this particular diner has enjoyed similar dishes before (cross features).
The following families form the organizing taxonomy for every CTR feature engineering discussion:
User features: These encode who is seeing the ad. Demographic attributes like age bucket, country, and device type provide static context. Behavioral signals such as historical CTR, category affinity scores, and session depth capture engagement patterns. Long-term signals like 30-day click-through rates on specific ad categories reveal stable preferences.
Ad features: These describe what is being shown. Creative metadata includes ad format, whether the creative is image or video, and text length. Advertiser category, historical ad-level CTR, bid amount, and campaign age round out the ad’s profile.
Context features: These capture when and where the ...