Detecting Data Drift

Explore various methods for detecting data drift in machine learning pipelines, including statistical tests like the Kolmogorov-Smirnov test and algorithmic approaches such as Page-Hinkley and DDM. Understand how to identify shifts in data distributions to maintain model performance and prevent bias.

We'll cover the following...

Statistical methods
Drift algorithms
- Page-Hinkley
- Drift detection methods (DDM)

Data drift is potentially harmful to an ML algorithm in deployment. As the underlying data changes, the predictions can become skewed—or worse, biased. In this lesson, we cover commonly used theoretical methods for identifying data drift.

Statistical methods

Statistical methods tend to be fast and low-lift. They’re simple mathematical formulations relying on hypothesis tests to detect drift at some confidence level.

Kolmogorov-Smirnov

The two-sample Kolmogorov-Smirnov (KS) test is a statistical hypothesis test with the following hypotheses:

$H_{o}$ : The two samples come from the same distribution.
$H_a$ : The two samples are drawn from different distributions.

For two samples of size $n$ and $m$ ...

1.Introduction

2.Disasters in Data

3.Disasters in Models

Project

4.Alternatives to Traditional ML

Project

5.Conclusion

Assessment

Detecting Data Drift

Statistical methods

Kolmogorov-Smirnov