Precision, Recall, and Confusion Matrix

We have learned about various ML models, but how do we evaluate them? For regression, we can use the difference between the actual and the predicted values — Root Mean Square Error, RMSE, or ordinary least square method, to be more precise — but what about classification models?

One might think that accuracy is a good enough measure to evaluate the goodness of a model. Accuracy is a very important evaluation measure, but it might not be the best metric all the time. Let’s understand this with an example.

The Accuracy Trap

Say we are building a model that predicts if patients have a chronic illness. We know that only 0.5% of the patients have the disease, or are “Positive” cases. Now, a dummy model could always give “Negative” as a default result and still have a high accuracy (99.5%!) because our dataset is skewed. Out of all the patients only 0.5% have the disease, so by giving “Negative” as a default answer for 100% of the cases, the model is still able to get the predictions right in 99.5% of the cases – we have a model with a very high accuracy! But is this of any good? Absolutely not! And this is where some other performance measures come into play.

Precision, Recall, and Confusion Matrix

Before we talk about these measures, let’s understand a few terms:

  1. TP / True Positive: the case was positive, and it was predicted as positive
  2. TN / True Negative: the case was negative, and it was predicted as negative
  3. FN / False Negative: the case was positive, but it was predicted as negative
  4. FP / False Positive: the case was negative, but it was predicted as positive

Since pictures help us to remember things better:

Get hands-on with 1200+ tech skills courses.