Overview of Image Classification

Get introduced to fundamental concepts of image classification.

Image classification is a branch of computer vision that aims to classify or categorize images into predefined classes. It’s one of the most common tasks in machine learning. The manufacturing, automobile, and healthcare industries prominently use it in their pipelines. Generally, an image classification model is trainable via supervised or unsupervised machine learning methods.

Supervised learning

Supervised classification refers to using labeled datasets to train algorithms for image classification. The output model is able to accurately classify images into their corresponding labels. We can think of it as how a child learns about fruits from a teacher. First, the teacher will show a picture to the student and say that it’s an apple. Then, the teacher will show a different picture and say that it’s a banana. The process continues until the student can classify the fruits independently without help from the teacher. In this example, the teacher provides the student with photos of apples and bananas and let the student know which one was an apple and which one was a banana. We call this labeled data.

The following image illustrates an example of supervised learning:

The model is trained with three classes of labeled data (apple, grape, and banana), and a prediction is made based on three new images to obtain the final output. One major problem with supervised learning is the need for labels. Both the quality and quantity of our labels are important factors that affect the performance of an image classification model. Therefore, some developers might prefer to classify images via unsupervised learning.

Unsupervised learning

Unsupervised classification doesn’t rely on labeled datasets. It’s fully automated to predict the label for images based on certain characteristics or hidden patterns from historical data. Most of the time, it uses either image clustering or pattern recognition. Let’s imagine that there’s a container filled with different kinds of fruits. Our task is to separate them into different groups. Logically, we can group them by color, size, shape, and so on. The following image serves as a reference on unsupervised learning:

The fruits are clustered based on the hidden patterns recognized by the unsupervised model. Unsupervised learning is an excellent approach if we lack labeled data because we can use it to label our data. Then, we can fix the incorrectly labeled data and retrain a new image classification model via supervised learning.


Types

Image classification consists of two types depending on the use cases:

  1. Multi-class classification

  2. Multi-label classification

Multi-class classification

A multi-class classification model, also known as a single-label classification, predicts a single label from a list of available classes. The model is a binary classification model if there are only two labels. Usually, the softmax activation function is used to ensure that the individual score of each label is between zero and one. The scores of all the classes sum up to one, which represents the probability of the prediction.

Note: A softmax activation function normalizes the values into probabilities. The final values sum up to one.

Multi-label classification

In multi-label classification, predictions are independent of each other. This means that an image can have multiple labels. The sigmoid function is applied to bind the scores for each label in the same range.

Note: A sigmoid function transforms the values to be between zero and one. The following table shows the differences between each type of image classification.

Type Image Labels Prediction
Binary classification - Apple
- Banana
Banana
Multi-class classification - Chair
- Fan
- Sofa
- Table
Chair
Multi-label classification - Man
- Woman
- Has mask
- Black hair
- Red hair
- Brown hair
- Blonde hair
Man, Has mask, Black hair

Supervised learning is the desirable choice for most projects as long as there are labeled datasets. Meanwhile, unsupervised learning prevails for use cases that involve clustering and segmentation.