Evaluating a Model

We'll cover the following...

Precision, Recall, and Confusion Matrix
AUC-ROC Curve
- - ROC Curve Analysis: Example Case Study

Precision, Recall, and Confusion Matrix

We have learned about various ML models, but how do we evaluate them? For regression, we can use the difference between the actual and the predicted values — Root Mean Square Error, RMSE, or ordinary least square method, to be more precise — but what about classification models?

One might think that accuracy is a good enough measure to evaluate the goodness of a model. Accuracy is a very important evaluation measure, but it might not be the best metric all the time. Let’s understand this with an example.

The Accuracy Trap

Say we are building a model that predicts if patients have a chronic illness. We know that only 0.5% of the patients have the disease, or are “Positive” cases. Now, a dummy model could always give “Negative” as a default result and still have a high accuracy (99.5%!) because our dataset is skewed. Out of all the patients only 0.5% have the disease, so by giving “Negative” as a default answer for 100% of the cases, the model is still able to get the predictions right in 99.5% of the cases – we have a model with a very high accuracy! But is this of any good? Absolutely not! And this is where some other performance measures come into play.

Precision, Recall, and Confusion Matrix

Before we talk about these measures, let’s understand a few terms:

TP / True Positive: the case was positive, and it was predicted as positive
TN / True Negative: the case was negative, and it was predicted as negative
FN / False Negative: the case was positive, but it was predicted as negative
FP / False Positive: the case was negative, but it was predicted as positive

Since pictures help us to remember things better:

Now that we know the meaning of false positives, false negatives, true positives, and true negatives, we can learn about the famous Confusion Matrix.

A confusion matrix has two rows and two columns that report the number of false positives, false negatives, true positives, and true negatives. Basically, it is a summary table showing how good our model is at predicting examples of various classes.

For example, if we have a classification model that has been trained to distinguish between cats and dogs, a confusion matrix will summarize the results of testing the algorithm on new data. Assuming a sample of 13 animals — 8 cats and 5 dogs — our confusion matrix would look like this:

Python Fundamentals for Data Science

The Fundamentals of Statistics

Machine Learning 101

End-to-End Machine Learning Project

The Real Talk

Evaluating a Model

Precision, Recall, and Confusion Matrix

The Accuracy Trap

Precision, Recall, and Confusion Matrix