...

/

Problem Statement and Metrics

Problem Statement and Metrics

Learn about the problem statement and metrics for building an Ad click prediction machine learning system.

Ad click prediction

1. Problem statement

Predicting whether a user will click on an ad is a critical problem in digital advertising. It impacts everything—from revenue to user experience. Build a machine learning model that predicts whether an ad will be clicked.

We’re focusing on a binary classification problem:

  • Input: Features about the user, the ad, and the context

  • Output: Probability that the user will click the ad (1 for click, 0 for no click)

To keep things simple, we’ll not dive into the more complex multi-stage ad ranking pipelines (like cascaded classifiers). Instead, we’ll treat it as a standalone prediction problem.

Press + to interact
Ads recommendation system
Ads recommendation system

Background: How ads are served

Before jumping into metrics and modeling, let’s understand how ads are typically served. Most ad delivery systems follow a waterfall model.

The waterfall revenue model
  1. Publishers try to sell ad impressions via direct deals with the highest CPM (Cost Per Mille).

  2. If the impression isn’t sold, it’s passed down the “waterfall” to other ad networks or exchanges.

  3. This continues until the impression is finally sold.

The faster and more accurately you predict which ad will be clicked, the better your system performs—not just in clicks, but in revenue and user engagement.

Press + to interact
Waterfall Revenue Model
Waterfall Revenue Model

2. Metrics design and requirements

Metrics

During the training phase, we can focus on machine learning metrics instead of revenue metrics or CTR metrics. Below are the two metrics:

Offline metrics
  • Normalized Cross-Entropy (NCE): NCE is the predictive logloss divided by the cross-entropy of the background CTR. This way NCE is insensitive to background CTR. This is the NCE formula:

NCE=1Ni=1n(1+yi2log(pi))+1yi2log(1pi))(plog(p)+(1p)log(1p))NCE = \frac{-\frac{1}{N} \sum_{i=1}^n (\frac{1+y_i}{2} log(p_i)) + \frac{1-y_i}{2}log(1-p_i))} {-(p*log(p) +(1-p)*log(1-p))} ...