Search⌘ K
AI Features

Measuring ML Model Performance

Explore the essentials of measuring machine learning model performance, including how to use loss functions to quantify errors. Understand key concepts such as training error, test error, overfitting, underfitting, and the bias-variance tradeoff. Learn how dataset splitting into training, validation, and test sets helps to select optimal model parameters for reliable predictions.

When we build the ML model, we use the data and create the model. Consider the problem of predicting the skill level of a student from his answers to a different question. We take the data and create a classification model that tells us the skill of a student on a scale of 1 to 5 (integers). We verify this model on the data available to build the model.

However, what happens when we deploy this model to production? Maybe it is good for some students, but it gives bad results for others. Maybe some highly skilled students got lower ratings due to wrong answers to a particular question. So, we may end up either creating a good model or a bad model that no one wants to use. Our goal is to always create the best model. In cases where business depends on ML models, it is necessary to create a reliable, good model.

Loss function

So, we want to know how successful the model is or the amount of loss associated with our predictions. This can be measured with a loss function.

Loss function = L(y,fw(x))

Here, y stands for the true value, and fw(x) is the predicted value. L is the function that takes these two and gives the loss using some computation.

For example

Squared error function: (y - fw(x))2

Absolute error function: |y - fw(x)|

Training error

Training error is the error we get for the training data. Training data is used for training the model. So, we create the model and get the error estimates on the same data.

1.

How does training error change with increased model complexity?

Show Answer
Did you find this helpful?
1.

Is training error a good performance measure for any ML model?

Show Answer
Did you find this helpful?

Now, the complete data looks like this:

With a highly complex model, it is optimistic about training data but misses the distribution of true data.

Quiz: Training error

1.

You have been given two datasets. Both have the same instances. Features of one dataset are the subset of the other dataset? Which of the following statements is true about the training errors?

A.

Training errors would be the same for both dataset for any model

B.

Training errors would be higher for larger feature dataset

C.

Training errors would be higher for smaller feature dataset

D.

Can’t answer with the provided information


1 / 1

Test error

We can create a test set from the data. Test data is the data that was not used for model training. So, we train the model on training data and check its performance on testing data.

1.

How does the testing error change with increasing model complexity?

Show Answer
Did you find this helpful?
1.

Is testing error a good performance measure for any ML model?

Show Answer
1 / 2

Overfitting

Now let’s understand an important concept in machine learning: overfitting.

Overfitting occurs when there is a model with estimated parameters p’, such that

  • Training error ( p ) < Training error( p’ )
  • True error( p ) > True error( p’ )

    In other words, if our model does very good on training data but not on testing data or true data, that is overfitting.

    From the example above, if we keep increasing the complexity, we can get a smaller training error, but the testing error may increase at some point.

    For example, if we keep training models on training data and reduce the training error, we may get a perfect model on training data. However, it might not perform well on unseen data or test data.

  • Underfitting

    Underfitting is a situation where the model has not learned enough about the data, generating an overall low generalization and bad predictions.

    For example, if we do not train the model enough, it will not be good enough for prediction. This is underfitting.

    Sources of error?

    We can think of three sources of error in the model:

    • Noise
    • Bias
    • Variance

    Noise

    Real data is noisy. When we fit a model on the dataset, data points may change from a true relationship by some noise value. Even if we get the best parameters, most of the time, we cannot reduce the error to zero. The true relationship between y and x is below:

    Y = fw(x) + Errornoise

    This is also called an irreducible error. We cannot control this.

    Bias

    Bias is the measurement of how our model can fit the true relationship between the dependent and independent variables. Bias is the difference between true function and our estimated function on the dataset.

    Bias = fw(true)(x) – fw(estimated)

    Variance

    Variance is the difference between the different model fits. Variance is how a specific model differs from the estimated model fit. If we fit the model on two different samples of the dataset, the variance is the difference between the specific model and estimated general model.

    Variance = Over all possible i (fw(estimated)(x) – fw(i))

    1.

    What would the mean of noise error with the best model be?

    Show Answer
    Did you find this helpful?

    Quiz: Bias and variance

    1.

    For the low complexity model, what would the values of bias and variance be ?

    A.

    Low bias, low variance

    B.

    Low bias, high variance

    C.

    High bias, low variance

    D.

    High bias, high variance


    1 / 2

    Bias-variance tradeoff

    As the model complexity increases, bias keeps coming down and variance keeps increasing. This is known as the bias-variance tradeoff. You can understand this from below.

    1.

    Can we compute the bias and variance error?

    Show Answer
    Did you find this helpful?
    1.

    How do true errors behave with increasing data and fixed model complexity?

    Show Answer
    Did you find this helpful?
    1.

    How do training errors behave with increasing data and a fixed model complexity?

    Show Answer
    Did you find this helpful?

    Quiz: Data size

    1.

    For a fixed model complexity (parameter value), which of these errors exists? (Choose all that are valid) Multi-select

    A.

    Bias

    B.

    Variance

    C.

    Noise


    1 / 1

    Dataset splitting

    Earlier, we understood the dataset division using test and train sets. We train our model on the training dataset and check the performance on the testing dataset. For any model, we have different parameters to select. The parameter control model complexity. For example, in the tree-based model, the parameters could include what the depth of the tree should be or how many trees should be created in the model? Selecting good parameters can lead to low true error. Now the question is: how can we choose our parameters?

    One way of doing this is training on the training dataset with pre-selected parameter values (or complexity) and evaluating the performance on the test dataset. We check the model performance for each parameter value and select the best model that gives the lowest testing error.

    Any problem with this approach?

    Yes! In the above approach, we make each decision based on the performance of the test data. In other words, we are exposing our test set during the training of the model, which is not effective.

    To handle this situation, we can split the data into three parts instead of two: train, validation, and test. We can train the model on the training set. Then, we evaluate the value during the parameter selection on the validation set and after model creation, we test the final performance on the test set.

    Data splitting
    Data splitting

    Quiz: Performance of an ML model

    1.

    If we have high errors on both training and testing sets, what could the problem be?

    A.

    Low Bias, Low Variance

    B.

    Low Bias, High Variance

    C.

    High Bias, Low Variance

    D.

    High Bias, High Variance


    1 / 1