Search⌘ K
AI Features

Ad CTR Prediction: Model Architecture

Explore key design decisions for ad click-through rate prediction models. Understand how to choose between Wide & Deep, Deep & Cross Network, and DLRM architectures based on memorization and generalization trade-offs. Learn calibration techniques like Platt scaling and isotonic regression to ensure auction accuracy. Discover multi-task learning approaches to optimize CTR, conversion rates, and long-term user value, improving both revenue and user experience.

In the previous lesson, you built four feature families,user profile, ad creative, context, and interaction features,compressed sparse IDs into dense embeddings, and split computation between real-time and pre-computed feature stores. Those features now arrive at the model’s input layer as a mix of sparse categorical embeddings and dense numerical vectors. The design question that follows is one of the most common in MAANG ML system design interviews: given this heterogeneous input under a sub-100 ms latency budget, which model architecture should you choose?

This lesson walks through three design decisions that determine whether your ad CTR prediction system actually works in production. First, you will compare three industry-standard architectures,Wide & Deep, Deep & Cross Network (DCN), and DLRMDeep Learning Recommendation Model, each offering a different trade-off between memorization and generalization. Second, you will design a calibration layer that converts raw model outputs into true probabilities, without which the auction mechanism silently breaks. Third, you will build a multi-task learning setup that jointly optimizes CTR, conversion rate (CVR), and long-term user value. A miscalibrated model or a poorly chosen architecture does not just hurt a metric on a dashboard. It directly degrades ad revenue and user experience.

Architecture comparison

The three architectures below represent the dominant approaches at Google, Meta, and LinkedIn for CTR prediction. Each handles the tension between memorization (recalling specific feature co-occurrences) and generalization (predicting clicks for unseen combinations) differently.

Wide & Deep

Google introduced Wide & Deep in 2016 to combine two complementary learning strategies in a single model. The wide component is a linear model that operates on raw features and manually engineered cross features. If user X has historically clicked ads from advertiser Y, the wide component memorizes that specific co-occurrence through a cross feature like user_id × advertiser_id. The deep component is a standard feed-forward network that takes dense embeddings as input and learns to generalize across unseen feature combinations. Both components are jointly trained end-to-end with a combined logistic loss.

The limitation is clear: the wide component’s power depends entirely on the quality of hand-crafted cross features. Feature engineering becomes a bottleneck as the feature space grows.

Deep & Cross Network (DCN)

DCN replaces the manual cross features with an explicit cross networkA series of layers that automatically learn bounded-degree polynomial feature interactions without requiring manual feature engineering.. Each cross layer applies the operation xl+1=x0xlTw+b+xlx_{l+1} = x_0 \cdot x_l^T \cdot w + b + x_l ...