Search⌘ K
AI Features

Understanding NMS (Non-Maximum Suppression)

Explore how Non-Maximum Suppression (NMS) works to remove multiple overlapping bounding boxes in object detection, ensuring only the most confident box remains. Understand NMS for single and multiple classes, confidence scoring, and IoU thresholds to optimize YOLO predictions.

Why do we need NMS?

Object detection models often predict multiple bounding boxes for a single object in an image. However, the final output should ideally have only one bounding box per object. To achieve this, a technique called non-maximum suppression (NMS) is used.

In the image below, there are three bounding boxes predicted for one person with different confidence scores. On visual inspection, for our final output, we would prefer the green box because it fits better as compared to the other two boxes and has the highest confidence score.

Multiple bounding boxes predicted for a single object
Multiple bounding boxes predicted for a single object

How does NMS work?

NMS works at the inference stageWhile making predictions as a post processing step to suppress unnecessary boxes and to output a confident box.
The format of a bounding box followed in YOLO is [xmin,ymin,xmax,ymax,conf,classx_{min}, y_{min}, x_{max}, y_{max}, conf, class].

However, before learning NMS, we need to understand the object confidence score, which simply tells us the probability of an object being present. So, we only want our model to predict boxes that have a high chance of occurring.

NMS for a single class

Now, let’s understand the NMS algorithm through an example. For simplicity, we will assume it is a single-class problem.

An illustration showing predicted boxes for two persons of a single class with confidence scores
An illustration showing predicted boxes for two persons of a single class with confidence scores

As per the above image, only two boxes, green and pink, should be shown in the final result. We achieve this by performing the following steps:

  1. Because we want our final output to only show boxes with high confidence, we first eliminate all the boxes below the threshold provided during the inference. For instance, if the confidence score is set to 0.60, all the boxes with scores below it are discarded (black, red, and blue in this case).

After eliminating boxes with obj_conf<0.60
After eliminating boxes with obj_conf<0.60
  1. Now, for the remaining boxes list (P), we perform the following steps:

    1. Sort the remaining boxes in descending order of their probabilities.

    2. While there are still detections:

      1. Select the highest confidence score box, green (0.85), in our case, and remove it from the predicted boxes list P.

      2. Calculate the IoU of the green box with the yellow and pink box. Based on the IoU threshold provided by the user during the inference, let’s say 0.50; we discard all the boxes that have an overlap greater than the threshold in our list P.

      3. With this filtering, the yellow box gets eliminated.

    3. We again iterate over our predicted boxes list P and repeat steps A and B. So, in the end, we are left with two boxes, green and pink, as the final output.

NMS for multiple classes

For many of our OD tasks, we deal with multiple classes. So, along with the confidence score of the object (confobjconf_{obj}), we also use the confidence score of the classes (confclassconf_{class}).

This is done to penalize the output boxes if any one of the confidence scores is low during NMS. The process remains exactly similar to that of a single class but with updated confidence scores. We run the algorithm for each class independently. As a result, IoU scores are calculated for bounding boxes of the same class.

An illustration showing predicted boxes for two classes: a person and a mobile
An illustration showing predicted boxes for two classes: a person and a mobile

This is almost similar to the single-class algorithm. The only difference is that while calculating IoU, we calculate the overlap between objects of the same classes.

Time to code!

This example shows an input image with multiple overlapping bounding boxes around objects and their associated confidence scores. Our goal is to filter and display only one bounding box per object based on the confidence score threshold and the IoU threshold.

The following code displays the original image with multiple bounding boxes:

Python 3.8
import cv2
import numpy as np
image_path = "aug_bir.jpg"
image = cv2.imread(image_path)
# Bounding box coordinates
boxes = np.array([[164, 322, 379, 472], [184, 350, 400, 500], [150, 250, 300, 450]])
#confidence score for corresponding bounding box
scores = np.array([0.97, 0.65, 0.55])
for box,score in zip(boxes, scores):
cv2.rectangle(image, (box[0], box[1]), (box[2], box[3]), (0, 255, 0), 2)
cv2.putText(image, f"{score:.2f}", (box[0], box[1] - 5), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
cv2.putText(image, "Before NMS", (30,50),cv2.FONT_HERSHEY_SIMPLEX, 1,(0,0,0), 3)
cv2.imwrite("output/orig.png", image)

Steps

In the widget below:

  • Play with the confidence scores to remove the overlapping bounding boxes around the object.

  • Display the filtered boxes and the corresponding confidence scores in an image.

Python 3.8
"""Play around with threshold of NMS, to remove multiple detetcions"""
confidence_threshold = 0.50
# Filter the boxes and scores based on the confidence threshold
filtered_boxes = boxes[scores >= confidence_threshold]
filtered_scores = scores[scores >= confidence_threshold]
# Apply NMS
indices = nms(filtered_boxes, filtered_scores, confidence_threshold)
"""Add your code here"""
cv2.putText(image, "After NMS", (30,50),cv2.FONT_HERSHEY_SIMPLEX, 1,(0,0,0), 3)
cv2.imwrite("output/After.png", image)

Explanation

The following steps effectively remove the overlapping bounding boxes around objects in the input image, keeping only one bounding box per object based on the user-defined confidence score and IoU thresholds.

  1. Read an input image and define the bounding box coordinates and corresponding confidence scores.

  2. Set the confidence score threshold and the IoU threshold for the non-maximum suppression (NMS) process.

  3. Filter out the bounding boxes with confidence scores below the specified threshold.

  4. Apply the NMS function to the filtered bounding boxes and their corresponding confidence scores using the specified IoU threshold. This step eliminates overlapping bounding boxes and retains only the bounding box with the highest confidence score for each object.

  5. Draw the final bounding boxes on the input image and displays the result.