Problem Statement and Metrics
Learn about the problem statement and metrics for building an Ad click prediction machine learning system.
Ad click prediction
1. Problem statement
Predicting whether a user will click on an ad is a critical problem in digital advertising. It impacts everything—from revenue to user experience. Build a machine learning model that predicts whether an ad will be clicked.
We’re focusing on a binary classification problem:
-
Input: Features about the user, the ad, and the context
-
Output: Probability that the user will click the ad (1 for click, 0 for no click)
To keep things simple, we’ll not dive into the more complex multi-stage ad ranking pipelines (like cascaded classifiers). Instead, we’ll treat it as a standalone prediction problem.
Background: How ads are served
Before jumping into metrics and modeling, let’s understand how ads are typically served. Most ad delivery systems follow a waterfall model.
The waterfall revenue model
-
Publishers try to sell ad impressions via direct deals with the highest CPM (Cost Per Mille).
-
If the impression isn’t sold, it’s passed down the “waterfall” to other ad networks or exchanges.
-
This continues until the impression is finally sold.
The faster and more accurately you predict which ad will be clicked, the better your system performs—not just in clicks, but in revenue and user engagement.
2. Metrics design and requirements
Metrics
During the training phase, we can focus on machine learning metrics instead of revenue metrics or CTR metrics. Below are the two metrics:
Offline metrics
- Normalized Cross-Entropy (NCE): NCE is the predictive
logloss
divided by thecross-entropy
of the background CTR. This way NCE is insensitive to background CTR. This is the NCE formula:
...