Measuring ML Model Performance

Explore the essentials of measuring machine learning model performance, including how to use loss functions to quantify errors. Understand key concepts such as training error, test error, overfitting, underfitting, and the bias-variance tradeoff. Learn how dataset splitting into training, validation, and test sets helps to select optimal model parameters for reliable predictions.

We'll cover the following...

Loss function
Training error
Test error
Overfitting
Underfitting
Sources of error?
Bias-variance tradeoff
Dataset splitting

When we build the ML model, we use the data and create the model. Consider the problem of predicting the skill level of a student from his answers to a different question. We take the data and create a classification model that tells us the skill of a student on a scale of 1 to 5 (integers). We verify this model on the data available to build the model.

However, what happens when we deploy this model to production? Maybe it is good for some students, but it gives bad results for others. Maybe some highly skilled students got lower ratings due to wrong answers to a particular question. So, we may end up either creating a good model or a bad model that no one wants to use. Our goal is to always create the best model. In cases where business depends on ML models, it is necessary to create a reliable, good model.

Overfitting

Now let’s understand an important concept in machine learning: overfitting.

Overfitting occurs when there is a model with estimated parameters p’, such that

Training error ( p ) < Training error( p’ )

True error( p ) > True error( p’ )

In other words, if our model does very good on training data but not on testing data or true data, that is overfitting.

From the example above, if we keep increasing the complexity, we can get a smaller training error, but the testing error may increase at some point.

For example, if we keep training models on training data and reduce the training error, we may get a perfect model on training data. However, it might not perform well on unseen data or test data.

Dataset splitting

Earlier, we understood the dataset division using test and train sets. We train our model on the training dataset and check the performance on the testing dataset. For any model, we have different parameters to select. The parameter control model complexity. For example, in the tree-based model, the parameters could include what the depth of the tree should be or how many trees should be created in the model? Selecting good parameters can lead to low true error. Now the question is: how can we choose our parameters?

One way of doing this is training on the training dataset with pre-selected parameter values (or complexity) and evaluating the performance on the test dataset. We check the model performance for each parameter value and select the best model that gives the lowest testing error.

Any problem with this approach?

Yes! In the above approach, we make each decision based on the performance of the test data. In other words, we are exposing our test set during the training of the model, which is not effective.

To handle this situation, we can split the data into three parts instead of two: train, validation, and test. We can train the model on the training set. Then, we evaluate the value during the parameter selection on the validation set and after model creation, we test the final performance on the test set.

1.Are You Ready to Become a Data Scientist?

2.Python Basics

3.Python Libraries

4.More Data Science Tools

5.Data Structures and Algorithms - I

6.Data Structures and Algorithms - II

7.Statistics and Probability

8.Feature Engineering

9.Basics of Machine Learning

10.Regression

11.Classification

12.Unsupervised Learning

13.Advanced Topics in Machine Learning

14.Conclusion

Mock Interview

Measuring ML Model Performance

Loss function

Training error

Test error

Overfitting

Underfitting

Sources of error?

Bias-variance tradeoff

Dataset splitting