Samples as Tests

Learn how to use samples of real data for testing.

Overview

Samples as tests refers to using real data as fixtures to evaluate the performance of a machine learning model. This is often a more effective approach than synthetic data, because synthetic data may not accurately reflect the complexity and variability of real-world data.

Several types of real data can be used as samples for testing, including:

  • corner cases (edge cases that are unusual or extreme).

  • hard samples (challenging samples).

  • representative samples (samples that accurately reflect the overall characteristics of the data).

  • holdout sets (data that is withheld during training and used to evaluate the model’s performance).

However, there are also some disadvantages to using samples as tests. For example, if a model performs well on a particular sample, it does not necessarily mean it will perform well on all other data. In addition, using samples as tests can be time consuming, because it may require more time to gather and prepare real data than to generate synthetic data. Additionally, it’s important to select samples for testing carefully. Using samples that are never expected to fail may result in a false sense of security and potentially lead to test failures in real-world scenarios.

Typically, the more time we have and the more important the update, the more data we use for testing. From a small number of individual examples, we go to a large test sample. Overall, using samples as tests can be a valuable approach for testing related to a model’s performance.

Get hands-on with 1200+ tech skills courses.