Learning Curves

Learning Curves help us evaluate the performance of the model and make necessary decisions. This lesson covers concepts about Learning Curves.

Learning Curves

In the previous lessons, we trained our model on the training dataset and tested the trained model on the testing dataset. In the lessons on Regression Problems, we introduced another dataset named validation dataset. This validation dataset is also called a development dataset, as it is used during the development of a model. We have covered the intuition behind using the validation dataset. Let’s look at some Learning Curve plots that illustrate the model’s performance on the training and validation datasets and help avoid overfitting and other problems. Consider: how long should the model be trained? What is the size of the training dataset that we should be using? The model’s state can be evaluated on the training dataset to let us know how well the model is learning. The state of the model can be evaluated on the validation dataset to let us know how well the model is generalizing.

Learning curve (Impact of the size of training dataset)

We train our models using different sizes (mm) of the training dataset and record the error on the validation dataset and the training dataset. Then we will plot the respective errors (training and validation) against the training set size mm.

  • Validation error decreases as the number of instances, m, in the training dataset increase.
  • Training error increases as the number of instances, m, in the training dataset increase.
  • The goal would be to achieve the desired performance by increasing the training dataset. If the curve flattens out before achieving the desired performance, there would be no advantage to collecting more training datasets to improve the performance.

Get hands-on with 1200+ tech skills courses.