Problem Statement and Metrics
Let’s dive into the problem statement and metrics required for the Airbnb rental search ranking application.
Airbnb rental search ranking
1. Problem statement
Airbnb users often search for homes in a specific location. The platform must return relevant stays—but more than that, it must return homes users are likely to book. So the goal of the ranking system is simple:
Rank the homes such that those most likely to be booked appear higher in the search results.
A naive method might use keyword matching or hand-crafted scoring—like sorting based on similarity between the query and listing descriptions. But this fails in practice. Text similarity might show results that “sound good,” but don’t necessarily lead to bookings.
Instead, we want a data-driven approach. If we could estimate the likelihood of a user booking a given listing, we could rank by that likelihood. That brings us to the core idea:
Train a supervised machine learning model that learns from historical user sessions and predicts whether a listing will be booked. This becomes a binary classification task: booked vs. not booked.
Why binary classification?
- Our outcome is binary (booked or not).
- It allows flexibility in evaluating ranking, user behavior, and optimizing for downstream metrics like revenue.
- Alternative methods like regression (predicting booking probability directly) could work, but classification gives more control when balancing false positives vs. negatives—important in high-stakes ranking.
2. Metrics design and requirements
Metrics
Designing the right metrics is just as important as choosing the algorithm. The wrong metric can optimize the wrong behavior.
We break down metrics into two buckets: offline metrics (evaluated during training) and online metrics (measured in production).
Offline metrics
- Normalized discounted Cumulative Gain: nDCG is a standard metric in ranking problems where position matters. It gives higher weight to correct predictions near the top of the list—exactly what we want in search ranking.
Why nDCG?
- Users rarely scroll through all results. A relevant result at position 2 is more valuable than at position 10. *It accounts for both relevance and position, unlike basic accuracy or AUC. *It reflects user satisfaction better than simple classification metrics like precision or recall.
...