Related Tags

classification
machine learning

# What is a confusion matrix?

A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It is used to measure the performance of a classification model.

Confusion matrices are widely used because they give a better idea of a model’s performance than classification accuracy does. For example, in classification accuracy, there is no information about the number of misclassified instances. Imagine that your data has two classes where 85% of the data belongs to class A, and 15% belongs to class B. Also, ​assume that your classification model correctly classifies all the instances of class A, and misclassifies all the instances of class B. In this case, the model is 85% accurate. However, class B is misclassified, which is undesirable. The confusion matrix, on the other hand, displays the correctly and incorrectly classified instances for all the classes and will​, therefore, give a better insight into the performance of your classifier.

## Understanding the confusion matrix

Suppose that the test data consists of 40 records. The classifier will then predict whether or not a person has the flu. The following confusion matrix was generated based on the classifier’​s results:

The actual outcomes contain information on whether or not each individual had the flu, while the predicted outcomes are the predictions made by the classification model.

According to the information provided by the confusion matrix:

• The model correctly predicted that 13 people had the flu.

• The model falsely predicted that 3 people did not have the flu.

• The model correctly classified 20 people in the has_flu = no category.

• The model incorrectly classified that 4 people (who actually did not have the flu) had the flu.

• The model predicted that 17 (13 + 4) people had the flu.

• The model predicted that 23 (20 + 3) people did not have the flu.

• In actuality, 16 (13 + 3) people had the flu.

• In actuality, 24 (4 + 20) people did not have the flu.

The following illustration summarizes some important facts:

The same concepts apply to a multi-class classifier. For $n$ classes, the confusion matrix will have $n * n$ dimensions.

RELATED TAGS

classification
machine learning