...

/

YOLO (2015), YOLOv2 (2016), and YOLOv3 (2018)

YOLO (2015), YOLOv2 (2016), and YOLOv3 (2018)

Learn about the starting members of the biggest single-shot object detection family: YOLO, YOLOv2, YOLOv3.

You Only Look Once (YOLO) is the most popular single-shot object detection model, primarily due to its never-ending improvement story. Even though the first version, called simply YOLO, was created before SSD architecture, it kept improving over the years. We will examine the long road the YOLO family achieved, along with the novelties and improvements each new version has.

Let’s start with the base architecture created in 2016.

YOLO architecture

As it is a single-shot detector, we already assume that the architecture achieves object detection at one step, likely as SSD. But when we check the architecture, we can see that it works in an even simpler way than SSD.

Press + to interact
YOLO architecture
YOLO architecture

It takes a 488x488 size image as input and sends it to extract the image features through its custom backbone containing convolutional and maximum pooling layers. As the next step, it uses two fully connected layers to create the final object detections that will be sent to the non-maximum suppression algorithm.

The final fully connected layer contains S×S×B×CS \times S \times B \times C nodes where S×SS \times S represents the grid size, BB is the number of bounding boxes to predict from each pixel of ...