Grokking Modern System Design Interview for Engineers & Managers
Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.
A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It is used to measure the performance of a classification model.
Confusion matrices are widely used because they give a better idea of a model’s performance than classification accuracy does. For example, in classification accuracy, there is no information about the number of misclassified instances. Imagine that your data has two classes where 85% of the data belongs to class A, and 15% belongs to class B. Also, assume that your classification model correctly classifies all the instances of class A, and misclassifies all the instances of class B. In this case, the model is 85% accurate. However, class B is misclassified, which is undesirable. The confusion matrix, on the other hand, displays the correctly and incorrectly classified instances for all the classes and will, therefore, give a better insight into the performance of your classifier.
Suppose that the test data consists of 40 records. The classifier will then predict whether or not a person has the flu. The following confusion matrix was generated based on the classifier’s results:
The actual outcomes contain information on whether or not each individual had the flu, while the predicted outcomes are the predictions made by the classification model.
According to the information provided by the confusion matrix:
The model correctly predicted that 13 people had the flu.
The model falsely predicted that 3 people did not have the flu.
The model correctly classified 20 people in the has_flu = no category.
The model incorrectly classified that 4 people (who actually did not have the flu) had the flu.
The model predicted that 17 (13 + 4) people had the flu.
The model predicted that 23 (20 + 3) people did not have the flu.
In actuality, 16 (13 + 3) people had the flu.
In actuality, 24 (4 + 20) people did not have the flu.
The following illustration summarizes some important facts:
The same concepts apply to a multi-class classifier. For $n$ classes, the confusion matrix will have $n * n$ dimensions.
RELATED TAGS
Grokking Modern System Design Interview for Engineers & Managers
Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.