Search⌘ K
AI Features

Samples as Tests

Explore how to use real-world data samples such as corner cases, hard samples, and holdout sets to evaluate and ensure the reliability of machine learning models. Understand the benefits and challenges of this approach through examples from matching, search ranking, and demand forecasting.

Overview

Samples as tests refers to using real data as fixtures to evaluate the performance of a machine learning model. This is often a more effective approach than synthetic data, because synthetic data may not accurately reflect the complexity and variability of real-world data.

Several types of real data can be used as samples for testing, including:

  • corner cases (edge cases that are unusual or extreme).

  • hard samples (challenging samples).

  • representative samples (samples that accurately reflect the overall characteristics of the data).

  • holdout sets (data that is withheld during training and used to evaluate the model’s performance).

However, there are also some disadvantages to using samples as tests. For example, if a model performs well on a particular sample, it does not ...