What are anchor boxes?

Anchor boxes serve as predefined bounding boxes with specific widths and heights. Their purpose is to capture the aspect ratio and scale of different classes present within an image, essentially encapsulating a pair of width and height values. Earlier, anchor boxes were manually selected for specific datasets. However, with YOLOv5, a concept known as auto-anchor was introduced to automate the selection process.

To help visualize this concept, let’s imagine we have a variety of blocks with differing shapes and dimensions—squares, rectangles, and so on. These blocks encapsulate different objects in an image, such as a person or a car. The shape and dimensions of the blocks provide the model with cues to identify different objects in the image. For instance, if we examine the image below, it becomes apparent that “box2” isn’t suitable for detecting persons. In such scenarios, the model learns to choose the most appropriate anchor box based on the object’s shape and size.

Get hands-on with 1200+ tech skills courses.