Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

data
leakage
data sciences
machine learning

Data leakage in machine learning

Educative Answers Team

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Data leakage is a phenomenon that occurs when your model learns from data that shouldn’t be a part of the training data set or data that wouldn’t be available in a real-life​ scenario. It is most​ common when your data set already has the information that you’re trying to predict.

Time series forecasting

Data leakage is a common phenomenon in time series forecasting, i.e., where the data points follow a chronological order.

Depending on the nature of the data set, it is possible that the target variable has a distribution that is very similar for both data sets (the training and the test). However, such a case may not hold true in real-life scenarios. The model can learn how the probability of each target variable changes according to the moment in time. Thus, any feature included in the data set, that is related to time, may be​ a potential threat of data leakage.

Therefore, the first approach to counter data leakage in time series forecasting is to remove all the features that relate to time.

RELATED TAGS

data
leakage
data sciences
machine learning
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring