Understanding IoU (Intersection over Union)/Jaccard Index
Understand IoU’s importance and calculation by learning it’s significance in object detection.
What is IoU?
Intersection over Union (IoU) is a metric commonly used in object detection tasks, including YOLO, to evaluate the accuracy of the predicted bounding boxes. It is an important concept in computer vision that shows how much overlap (intersection) is between two polygons. A higher score of IoU means a larger overlap. It usually ranges from (0, 1), where 0 indicates no overlap between the two boxes.
Note: The objective of a model is to predict bounding boxes that are perfectly aligned with an actual object (i.e., achieving an overlap close to 1).
Why is IoU needed?
IoU is required in object detection for two tasks:
Evaluation metric: The aim of the OD model is not only to predict the bounding box around an object but also to ensure that the predicted box fits perfectly around the object, or in other terms, the predicted box is as close to the ground truth box as possible. As we can see in the image below, there is a significant overlap between the green and red boxes. But still, the predicted box is not accurate. The model aims to learn to make that overlap close to 1.
Applying NMS: As discussed earlier, because the number of bounding boxes predicted is high, it is common to have multiple bounding boxes predicted for a single object. To exclude these extra boxes, NMS is used, which eliminates boxes based on the
.confidence score This is the probability that a detected bounding box contains an object and accurately reflects the object's location and dimensions.
Ground-truth
Before understanding the IoU calculation, we first need to understand the GT (ground-truth) box and learn how we get it. For training any object detection model, we need a labeled dataset. This labeling is usually done through a tool, for example, Labellmg. It requires manual effort and a lot of precision. As the saying in machine learning—garbage in, garbage out— signifies, if our data is not labeled correctly, no matter what model we use, we will never get a good result.
Time to practice: Annotate an image using the Labellmg GUI
Select “Open Dir” on the left-hand side and click the “Choose” button at the bottom-right corner. Choose any
cancel.png
image to start annotation.Annotating images:
To start annotation, click the “Create RectBox” button. This will change your mouse cursor to a crosshair, allowing you to draw a rectangular bounding box on the image.
After drawing the bounding box, LabelImg will prompt you to provide a label. You can either add a new label or select one from the predefined list of labels in the drop-down menu.
Save the label. Please note that you’re using a version that will save labels as XML files in the PASCAL VOC format.
import javax.swing.*; import java.awt.*; import java.awt.event.*; public class Main extends JFrame implements ActionListener { private JLabel labelQuestion; private JLabel labelWeight; private JTextField fieldWeight; private JButton buttonTellMe; public Main() { super("Water Calculator"); initComponents(); setSize(Toolkit.getDefaultToolkit().getScreenSize()); setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); } private void initComponents() { labelQuestion = new JLabel("How much water should a person drink?"); labelWeight = new JLabel("My weight (kg):"); fieldWeight = new JTextField(5); buttonTellMe = new JButton("Tell Me"); setLayout(new FlowLayout()); add(labelQuestion); add(labelWeight); add(fieldWeight); add(buttonTellMe); buttonTellMe.addActionListener(this); } public void actionPerformed(ActionEvent event) { String message = "Buddy, you should drink %.1f L of water a day!"; float weight = Float.parseFloat(fieldWeight.getText()); float waterAmount = calculateWaterAmount(weight); message = String.format(message, waterAmount); JOptionPane.showMessageDialog(this, message); } private float calculateWaterAmount(float weight) { return (weight / 10f) * 0.4f; } public static void main(String[] args) { new Main().setVisible(true); } }
How is IoU calculated?
Before understanding the calculations of IoU, it is important to understand the axis system used in computer vision models. When dealing with OD models, we can use coordinates in either of the two formats:
Calculating the intersection area of the two boxes
To calculate the intersection points of two bounding boxes, we need to find the top-left and bottom-right coordinates of the overlapping region. Here’s a simplified explanation:
We determine the top-left intersection point (
): We compare the top-left coordinates of box1 (
) and box2 ( )
We determine the bottom-right intersection point (
): We compare the bottom-right coordinates of box1 (
) and box2 ( )
We calculate the width and height of the overlapping region:
By finding the intersection points and calculating the width and height, we can determine the area of the overlapping region between the two bounding boxes.
Calculating the total area of the two boxes
Because we have the top-left and bottom-right coordinates of these two boxes, we can easily calculate the area:
Now, we can simply calculate IoU by:
Exercise: Calculate the IoU if two bounding boxes are given
We are given two lists consisting of the bounding box coordinates in the (
""" Given below 2 list of bounding boxes, write a function to calculate IoUexample-box1 = [45,70, 383, 241]box2 = [15, 60, 200, 156]"""def calculate_iou(box1, box2):#write your code herereturn iou_score #iou score should be rounded to 4 decimal places
Additional exercise [optional]
Please note that although we extensively use IoU to calculate overlap between two boxes, there may be scenarios where we want to calculate an overlap between a polygon and a bounding box. Try to think of a logic for such a case and implement it.
Here are some real-life scenarios where calculating overlap between a polygon and a bounding box could be useful:
Object segmentation in images: In computer vision tasks, where we need to segment objects in an image, the object’s shape can be represented by a polygon. Comparing the overlap between the polygon and a bounding box can help refine the object’s location and size.
Collision detection in games: In video games, complex objects can be represented by polygons, and bounding boxes can be used for a quick approximation of the object’s location. Calculating the overlap between the polygon and the bounding box can help determine if a collision has occurred.
import cv2import numpy as npimport matplotlib.pyplot as pltimport randomimport osdef generate_random_polygon(size, vertex_count):vertices = []for _ in range(vertex_count):x = random.randint(0, size)y = random.randint(0, size)vertices.append((x, y))vertices = np.array(vertices, dtype=np.float32)hull = cv2.convexHull(vertices)return hulldef polygon_bbox_overlap(polygon_coords, bbox_coords):polygon = polygon_coords.astype(np.int32)bbox = np.array([[bbox_coords[0], bbox_coords[1]],[bbox_coords[2], bbox_coords[1]],[bbox_coords[2], bbox_coords[3]],[bbox_coords[0], bbox_coords[3]]], dtype=np.int32).reshape((-1, 1, 2))ret, intersection = cv2.intersectConvexConvex(polygon, bbox)if intersection is not None:intersection_exists = intersection.size > 0else:intersection_exists = Falsereturn polygon, bbox, intersection_exists, intersectiondef plot_polygon_bbox(polygon, bbox, intersection_exists, intersection):rgb_img = np.zeros((500, 500, 3), dtype=np.uint8)cv2.fillPoly(rgb_img, [bbox], color=(0, 255, 0))cv2.fillPoly(rgb_img, [polygon], color=(255, 0, 0))if intersection_exists:"display purple colour"cv2.fillPoly(rgb_img, [intersection.astype(np.int32)], color=(255,0,255))plt.imshow(rgb_img)plt.gca().invert_yaxis()if not os.path.exists("output"):os.makedirs("output")plt.savefig("output/binary.png")img_size = 500polygon_vertex_count = 6polygon_coords = generate_random_polygon(img_size, polygon_vertex_count)bbox_coords = (150, 100, 250, 300)polygon, bbox, intersection_exists, intersection = polygon_bbox_overlap(polygon_coords, bbox_coords)if intersection_exists:polygon_area = cv2.contourArea(polygon)bbox_area = cv2.contourArea(bbox)intersection_area = cv2.contourArea(intersection)union_area = polygon_area + bbox_area - intersection_areaif union_area == 0:iou = 0else:iou = intersection_area / union_areaprint(f"Intersection area: {intersection_area}")print(f"IoU: {iou}")else:print("No intersection between the polygon and the bounding box.")plot_polygon_bbox(polygon, bbox, intersection_exists, intersection)
Explanation
Line 7: The
generate_random_polygon(size, vertex_count)
function generates a random polygon with a specified number of vertices within a square area of a given size.Line 17: The
polygon_bbox_overlap(polygon_coords, bbox_coords)
function calculates the intersection between a polygon and a bounding box. It returns the coordinates of the polygon, bounding box, and a boolean indicating if there’s an intersection and the intersection coordinates if it exists.Line 34: The
plot_polygon_bbox(polygon, bbox, intersection_exists, intersection)
function visualizes the polygon, bounding box, and their intersection (if it exists) on an image using Matplotlib. It creates an RGB image, fills the bounding box and polygon with different colors, and fills the intersection area with another color.Line 51: This
polygon_coords = generate_random_polygon(img_size, polygon_vertex_count)
lines set the image size and the number of vertices for the random polygon and generate the random polygon using thegenerate_random_polygon
function.Line 54: This
polygon, bbox, intersection_exists, intersection = polygon_bbox_overlap(polygon_coords, bbox_coords)
line calculates the intersection between the polygon and the bounding box using thepolygon_bbox_overlap
function.