Information Leakage
Learn how information leakage can produce machine learning models that overfit in this lesson.
We'll cover the following...
We'll cover the following...
What is information leakage?
Information leakage occurs when a machine learning algorithm has access to information about future data during the training process. Information leakage produces models with better predictions than expected, leading to metrics (e.g., accuracy) that overestimate a model’s usefulness.
Test holdout sets and cross-validation simulate future data by withholding the information contained in the data during model training (e.g., validation folds in cross-validation). There are two common sources of information leakage in practice: