What is IoU?

Intersection over Union (IoU) is a metric commonly used in object detection tasks, including YOLO, to evaluate the accuracy of the predicted bounding boxes. It is an important concept in computer vision that shows how much overlap (intersection) is between two polygons. A higher score of IoU means a larger overlap. It usually ranges from (0, 1), where 0 indicates no overlap between the two boxes.

Note: The objective of a model is to predict bounding boxes that are perfectly aligned with an actual object (i.e., achieving an overlap close to 1).

An illustration of actual bounding box and predicted bounding box
An illustration of actual bounding box and predicted bounding box

Why is IoU needed?

IoU is required in object detection for two tasks:

  • Evaluation metric: The aim of the OD model is not only to predict the bounding box around an object but also to ensure that the predicted box fits perfectly around the object, or in other terms, the predicted box is as close to the ground truth box as possible. As we can see in the image below, there is a significant overlap between the green and red boxes. But still, the predicted box is not accurate. The model aims to learn to make that overlap close to 1.

An illustration of an actual bounding box and predicted bounding box
An illustration of an actual bounding box and predicted bounding box
  • Applying NMS: As discussed earlier, because the number of bounding boxes predicted is high, it is common to have multiple bounding boxes predicted for a single object. To exclude these extra boxes, NMS is used, which eliminates boxes based on the confidence score This is the probability that a detected bounding box contains an object and accurately reflects the object's location and dimensions..

Ground-truth

Before understanding the IoU calculation, we first need to understand the GT (ground-truth) box and learn how we get it. For training any object detection model, we need a labeled dataset. This labeling is usually done through a tool, for example, Labellmg. It requires manual effort and a lot of precision. As the saying in machine learning—garbage in, garbage outsignifies, if our data is not labeled correctly, no matter what model we use, we will never get a good result.

Time to practice: Annotate an image using the Labellmg GUI

  1. Select “Open Dir” on the left-hand side and click the “Choose” button at the bottom-right corner. Choose any cancel.png image to start annotation.

  2. Annotating images:

    1. To start annotation, click the “Create RectBox” button. This will change your mouse cursor to a crosshair, allowing you to draw a rectangular bounding box on the image.

    2. After drawing the bounding box, LabelImg will prompt you to provide a label. You can either add a new label or select one from the predefined list of labels in the drop-down menu.

  3. Save the label. Please note that you’re using a version that will save labels as XML files in the PASCAL VOC format.

import javax.swing.*;
import java.awt.*;
import java.awt.event.*;
public class Main extends JFrame implements ActionListener {
	private JLabel labelQuestion;
	private JLabel labelWeight;
	private JTextField fieldWeight;
	private JButton buttonTellMe;
	public Main() {
		super("Water Calculator");
		initComponents();
		setSize(Toolkit.getDefaultToolkit().getScreenSize());
		setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
	}
	private void initComponents() {
		labelQuestion = new JLabel("How much water should a person drink?");
		labelWeight = new JLabel("My weight (kg):");
		fieldWeight = new JTextField(5);
		buttonTellMe = new JButton("Tell Me");
		setLayout(new FlowLayout());
		add(labelQuestion);
		add(labelWeight);
		add(fieldWeight);
		add(buttonTellMe);
		buttonTellMe.addActionListener(this);
	}
	public void actionPerformed(ActionEvent event) {
		String message = "Buddy, you should drink %.1f L of water a day!";
		float weight = Float.parseFloat(fieldWeight.getText());
		float waterAmount = calculateWaterAmount(weight);
		message = String.format(message, waterAmount);
		JOptionPane.showMessageDialog(this, message);
	}
	private float calculateWaterAmount(float weight) {
		return (weight / 10f) * 0.4f;
	}
	public static void main(String[] args) {
		new Main().setVisible(true);
	}
}
Annotating an image with LabelImg

How is IoU calculated?

Before understanding the calculations of IoU, it is important to understand the axis system used in computer vision models. When dealing with OD models, we can use coordinates in either of the two formats:

  • (xmin,ymin,xmax,ymax)(x_{min}, y_{min}, x_{max}, y_{max})

  • (xmin,ymin,w,h)(x_{min}, y_{min}, w, h)

An axis system in computer vision applications
An axis system in computer vision applications
A representation of the coordinates of the bounding box
A representation of the coordinates of the bounding box

Calculating the intersection area of the two boxes

To calculate the intersection points of two bounding boxes, we need to find the top-left and bottom-right coordinates of the overlapping region. Here’s a simplified explanation:

  1. We determine the top-left intersection point (xint1,x_{int1}, yint1y_{int1}):

    1. We compare the top-left coordinates of box1 (x1,x_{1}, y1y_{1}) and box2 (x3,y3x_{3}, y_{3})

    2. xint1=max(x1,x3)x_{int1}\: = max(x_{1}, x_{3})

    3. yint1=max(y1,y3)y_{int1}\:= max(y_{1},y_{3})

  2. We determine the bottom-right intersection point (xint2,yint2x_{int2}, y_{int2}):

    1. We compare the bottom-right coordinates of box1 (x2,y2x_{2}, y_{2}) and box2 (x4,y4x_{4}, y_{4})

    2. xint2=min(x2,x4)x_{int2}\:= min(x_{2},x_{4})

    3. yint2=min(y2,y4)y_{int2}\:=min(y_{2},y_{4})

  3. We calculate the width and height of the overlapping region:

    1. wint=abs(xint2xint1)w_{int}\:=abs(x_{int2}\:-x_{int1})

    2. hint=abs(yint2yint1)h_{int}\:=abs(y_{int2}\:-y_{int1})

Pictorial representation of IoU calculation
Pictorial representation of IoU calculation

By finding the intersection points and calculating the width and height, we can determine the area of the overlapping region between the two bounding boxes.

Calculating the total area of the two boxes

  • Because we have the top-left and bottom-right coordinates of these two boxes, we can easily calculate the area:

    • area(box1)=(y2y1)×(x2x1)area(box1) = (y_2-y_1)\times(x_2-x_1)

    • area(box2)=(y4y3)×(x4x3)area(box2) = (y_4-y_3)\times(x_4-x_3)

    • area(union)=area(box1)+area(box2)area(intersection)area(union) = area(box1) + area(box2) - area(intersection)

  • Now, we can simply calculate IoU by:
    IoU=area(intersection)÷area(union)IoU = area(intersection)\div area(union)

Exercise: Calculate the IoU if two bounding boxes are given

We are given two lists consisting of the bounding box coordinates in the (xminx_{min}, xmaxx_{max}, yminy_{min},ymaxy_{max}) format. The task is to calculate the overlap between these two boxes.

Press + to interact
""" Given below 2 list of bounding boxes, write a function to calculate IoU
example-
box1 = [45,70, 383, 241]
box2 = [15, 60, 200, 156]
"""
def calculate_iou(box1, box2):
#write your code here
return iou_score #iou score should be rounded to 4 decimal places

Additional exercise [optional]

Please note that although we extensively use IoU to calculate overlap between two boxes, there may be scenarios where we want to calculate an overlap between a polygon and a bounding box. Try to think of a logic for such a case and implement it.

Here are some real-life scenarios where calculating overlap between a polygon and a bounding box could be useful:

  • Object segmentation in images: In computer vision tasks, where we need to segment objects in an image, the object’s shape can be represented by a polygon. Comparing the overlap between the polygon and a bounding box can help refine the object’s location and size.

  • Collision detection in games: In video games, complex objects can be represented by polygons, and bounding boxes can be used for a quick approximation of the object’s location. Calculating the overlap between the polygon and the bounding box can help determine if a collision has occurred.

Press + to interact
import cv2
import numpy as np
import matplotlib.pyplot as plt
import random
import os
def generate_random_polygon(size, vertex_count):
vertices = []
for _ in range(vertex_count):
x = random.randint(0, size)
y = random.randint(0, size)
vertices.append((x, y))
vertices = np.array(vertices, dtype=np.float32)
hull = cv2.convexHull(vertices)
return hull
def polygon_bbox_overlap(polygon_coords, bbox_coords):
polygon = polygon_coords.astype(np.int32)
bbox = np.array([[bbox_coords[0], bbox_coords[1]],
[bbox_coords[2], bbox_coords[1]],
[bbox_coords[2], bbox_coords[3]],
[bbox_coords[0], bbox_coords[3]]], dtype=np.int32).reshape((-1, 1, 2))
ret, intersection = cv2.intersectConvexConvex(polygon, bbox)
if intersection is not None:
intersection_exists = intersection.size > 0
else:
intersection_exists = False
return polygon, bbox, intersection_exists, intersection
def plot_polygon_bbox(polygon, bbox, intersection_exists, intersection):
rgb_img = np.zeros((500, 500, 3), dtype=np.uint8)
cv2.fillPoly(rgb_img, [bbox], color=(0, 255, 0))
cv2.fillPoly(rgb_img, [polygon], color=(255, 0, 0))
if intersection_exists:
"display purple colour"
cv2.fillPoly(rgb_img, [intersection.astype(np.int32)], color=(255,0,255))
plt.imshow(rgb_img)
plt.gca().invert_yaxis()
if not os.path.exists("output"):
os.makedirs("output")
plt.savefig("output/binary.png")
img_size = 500
polygon_vertex_count = 6
polygon_coords = generate_random_polygon(img_size, polygon_vertex_count)
bbox_coords = (150, 100, 250, 300)
polygon, bbox, intersection_exists, intersection = polygon_bbox_overlap(polygon_coords, bbox_coords)
if intersection_exists:
polygon_area = cv2.contourArea(polygon)
bbox_area = cv2.contourArea(bbox)
intersection_area = cv2.contourArea(intersection)
union_area = polygon_area + bbox_area - intersection_area
if union_area == 0:
iou = 0
else:
iou = intersection_area / union_area
print(f"Intersection area: {intersection_area}")
print(f"IoU: {iou}")
else:
print("No intersection between the polygon and the bounding box.")
plot_polygon_bbox(polygon, bbox, intersection_exists, intersection)

Explanation

  • Line 7: The generate_random_polygon(size, vertex_count) function generates a random polygon with a specified number of vertices within a square area of a given size.

  • Line 17: The polygon_bbox_overlap(polygon_coords, bbox_coords) function calculates the intersection between a polygon and a bounding box. It returns the coordinates of the polygon, bounding box, and a boolean indicating if there’s an intersection and the intersection coordinates if it exists.

  • Line 34: The plot_polygon_bbox(polygon, bbox, intersection_exists, intersection) function visualizes the polygon, bounding box, and their intersection (if it exists) on an image using Matplotlib. It creates an RGB image, fills the bounding box and polygon with different colors, and fills the intersection area with another color.

  • Line 51: This polygon_coords = generate_random_polygon(img_size, polygon_vertex_count) lines set the image size and the number of vertices for the random polygon and generate the random polygon using the generate_random_polygon function.

  • Line 54: This polygon, bbox, intersection_exists, intersection = polygon_bbox_overlap(polygon_coords, bbox_coords) line calculates the intersection between the polygon and the bounding box using the polygon_bbox_overlap function.