Train/Test Split

Learn the principles of splitting data into train and test sets.

Principle behind the train/test split

When we train a forecasting model, the algorithm behind it will optimize some of its parameters to better fit the data we feed into it. However, we can often observe a phenomenon called overfitting, which means our parameters work well for the data used for training but not for the new data.

In other words, our model is not capable of generalizing, which means it will be fine-tuned for the data we used to train it. However, whenever we try to apply it to data it has never seen before, it won't perform well.

To avoid that issue, we can randomly split our data into two sets>

  1. Train: Data that we'll use to train our model.

  2. Test: Data that we'll use to test our model performance. (It can also be used to tune hyperparameters and compare models.)

Get hands-on with 1200+ tech skills courses.