What is a YOLO model?

In the world of computer vision and object detection, YOLO (You Only Look Once) has emerged as a groundbreaking approach. It revolutionized the field by providing real-time object detection with impressive accuracy. YOLO’s innovation lies in its ability to detect objects in an image with a single pass through the neural network, unlike previous approaches that required multiple passes or sliding window techniques. This article provides a detailed exploration of what YOLO is, how it works, its variants, applications, and its impact.

What is YOLO?

YOLO, short for You Only Look Once, is an object detection algorithm developed by Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi in 2015. Its main purpose is to detect objects in images in real time. Traditional object detection algorithms use a sliding window across the image and apply a classifier to each window, which is both computationally intensive and slow. YOLO, on the other hand, treats object detection as a multiple regression problem, predicting spatially distinct bounding boxes and their corresponding class probabilities in a single pass through the neural network.

How does YOLO work?

The YOLO algorithm segments the input image into a grid and generates predictions for bounding boxes and class probabilities within each grid cell. For each cell, it concurrently predicts several bounding boxes along with their associated class probabilities. YOLO determines bounding boxes by regressing from predefined anchor boxes, which are prior boxes with varying sizes and aspect ratios. These predicted bounding boxes are then filtered using a confidence score threshold to retain the most accurate detections.

Here’s a step-by-step overview of how YOLO works:

  1. Input division: YOLO divides the input image into an S × S grid.

  2. Bounding box prediction: For each grid cell, YOLO predicts bounding boxes. Each bounding box has five components: (x, y, w, h, confidence).

    1. (x, y) denote the coordinates of the bounding box’s center in relation to the grid cell.

    2. (w, h) denote the width and height of the bounding box relative to the entire image.

    3. confidence indicates the likelihood that the bounding box contains an object and the precision of the bounding box.

  3. Class prediction: Alongside each bounding box, YOLO predicts class probabilities for each object class. This is usually done using a softmax activation function.

  4. Non-max suppression: To eliminate duplicate detections of the same object, YOLO uses non-maximum suppression (NMS). It selects the bounding box with the highest confidence score and removes any other boxes with high overlap (IoU) with it.

  5. Output: The final result of YOLO is a collection of bounding boxes, each paired with a class label and a confidence score.

YOLO preliminary architecture
YOLO preliminary architecture

YOLO variants

Since its inception, YOLO has undergone several iterations and improvements. Some notable variants include:

YOLOv2 (2016)

  • Introduced by Joseph Redmon and Ali Farhadi, YOLOv2 improved accuracy and speed over its predecessor.

  • This was achieved through deeper network architecture, batch normalization, anchor boxes for better bounding box prediction, and high-resolution classifiers for improved detection.

YOLOv3 (2018)

  • YOLOv3 further enhanced accuracy and speed compared to YOLOv2.

  • Key improvements included multiscale detection for objects of varying sizes, feature pyramid networks for richer feature extraction, and prediction across different scales for better localization.

YOLOv4 (2020)

  • Focused on achieving a balance between accuracy and speed, YOLOv4 incorporated advancements like CSPDarknet53CSPDarknet53 is a neural network backbone that enhances the Darknet53 architecture by incorporating Cross Stage Partial (CSP) connections to improve gradient flow and reduce computation while maintaining high performance in object detection tasks. as the backbone for efficiency, various data augmentation techniques to improve generalization, and novel activation functions for enhanced performance.

YOLOv5 (2020)

  • Developed by Ultralytics with a focus on usability and performance, YOLOv5 boasts a streamlined architecture for ease of use and training, an efficient training pipeline with a focus on speed, and state-of-the-art performance on various object detection benchmarks.

YOLOv6 (September 2022)

  • Developed by Meituan researchers to balance speed and accuracy, YOLOv6 introduced the Bidirectional Concatenation (BiC)A technique that combines information from both forward and backward passes in neural networks to enhance feature representation and model performance. module for improved information flow, anchor-aided training (AAT)A technique in object detection that uses predefined anchor boxes to improve the accuracy and efficiency of bounding box predictions. strategy for efficient learning, and an enhanced backbone and neck design for better performance. It also offers multiple pretrained models (YOLOv6-N, YOLOv6-S, YOLOv6-M, YOLOv6-L) catering to different speed-accuracy needs.

YOLOv7 (July 2022)

  • Currently the fastest and most accurate real-time object detector in the YOLO family, YOLOv7 achieves this through advanced deep learning techniques and efficient design.

YOLOv8 (January 2023)

  • Building upon the success of YOLOv5, YOLOv8, developed by Ultralytics, introduces new features for enhanced performance and flexibility. It utilizes anchor-free detection and new convolutional layers for improved predictions.

YOLOv9 (February 2024)

  • The latest addition to the YOLO family, YOLOv9 achieves a higher mAP than previous versions on the MS COCO dataset. It introduces a new architecture called “YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information” and offers open-source code for training custom YOLOv9 models.

Applications of YOLO

The YOLO algorithm has found applications across diverse domains, including:

  • Autonomous driving: YOLO is used for object detection in autonomous vehicles to identify pedestrians, vehicles, cyclists, and other objects in the vehicle’s surroundings.

  • Surveillance and security: YOLO is employed in surveillance systems for real-time monitoring, intrusion detection, and facial recognition.

  • Medical imaging: YOLO aids in medical imaging tasks such as tumor detection, organ segmentation, and disease diagnosis.

  • Retail and inventory management: YOLO is utilized in retail environments for shelf monitoring, product recognition, and inventory management.

  • Sports analytics: YOLO is applied in sports analytics for player tracking, ball detection, and action recognition in various sports.

Conclusion

YOLO (You Only Look Once) has significantly advanced the field of object detection by providing real-time detection with impressive accuracy. Its innovative approach of formulating object detection as a regression problem and predicting bounding boxes and class probabilities in a single pass through the network has paved the way for numerous applications across diverse domains. With continuous improvements and variants, YOLO remains at the forefront of object detection research and technology, empowering various industries with its capabilities.


Copyright ©2024 Educative, Inc. All rights reserved