Types of Object Detection Algorithms
Explore two types of deep learning-based detection algorithms.
Deep learning is instrumental in the constant evolution of object detection. At present, deep learning-based object detectors can be divided into these two categories:
One-stage detectors
Two-stage detectors
How do two-stage object detectors work?
As the name suggests, these types of detectors divide the OD task into two networks:
ROI (region of interest) extraction/region proposal network: This network tries to identify/detect the regions in an image that might contain objects to pass on to the next stage. This helps in reducing computational requirements because instead of the complete image, only relevant regions are passed on to the next stage. The functions of a RPN (region proposal network) are as follows:
Generate regions by sliding a window of different sizes over the image.
Check if any object (irrespective of the class) can be present in that region by predicting the probability of an object being present in each region.
If yes, pass the coordinates of that region to the next network.
Localization and classification: An OD task requires not just classification but also localization of objects. This is a 2-step process:
Classification: We classify the object for each ROI generated in the previous layer. This outputs the probability of an object belonging to a particular class, also referred to as class score.
Localization: The OD model not only needs to identify the objects correctly but should also predict the bounding box fitting the object closely. Bounding box refinement happens in this stage.
How are the one-stage and two-stage detectors different?
One-stage detectors prioritize speed and efficiency by combining object localization and classification into a single pass, while two-stage detectors focus on achieving higher accuracy through a separate region proposal and classification step.
The key difference between these two detectors lies in their training method. A one-stage detector performs both classification and localization. This makes it much faster and simpler compared to a two-stage detector.
One-stage detectors like YOLO skip the region proposal stage and the box refinement step.
One-stage detectors treat object detection as a single, unified regression problem that involves predicting the coordinates of bounding boxes and associated class probabilities directly from the raw image pixels. However, two-stage detectors use a combination of classification and regression to predict bounding box locations.
How do single-stage detectors work if there is no ROI detection?
Without ROI detection, a complete image is passed through the network, which performs classification and localization in a single pass. For most of the detectors, the image is divided into a grid of cells, and then these grids are used for further computations. YOLOv1 is used to predict a fixed number of bounding boxes for each cell. The issue with this approach is that the number of bounding boxes predicted can be pretty high. So, to handle this problem, a concept called NMS (non-maximum suppression) is used.