Search⌘ K
AI Features

Object Detection: Data Strategy & Augmentation

Understand four critical data strategy pillars for object detection in autonomous driving. Learn to balance annotation types, handle rare events, integrate synthetic data, and apply active learning to optimize cost and performance while ensuring safety. This lesson forms the foundation for effective model design in real-world perception systems.

Passive fleet logging alone cannot close the rare-event data gap identified in the previous lesson. A self-driving perception system that logs millions of routine highway frames will still lack sufficient examples of a child darting between parked cars or a wheelchair user crossing at dusk. This gap makes data strategy the highest-leverage design decision in safety-critical object detection, and interviewers know it.

Consider the core interview scenario: you are asked to design the data pipeline for an autonomous vehicle perception system that must detect pedestrians, cyclists, and edge-case obstacles with high recall under diverse conditions. At L5 and Staff+ levels, interviewers expect you to articulate annotation type trade-offs, rare-event mitigation, synthetic data integration, and annotation cost optimization before discussing any model architecture. This lesson covers each of these four pillars in depth, building the data foundation that the next lesson’s model architecture and compression decisions will depend on.

Annotation types for sensor fusion

Modern perception stacks rely on three primary annotation types, each occupying a different point on the cost-accuracy spectrum. Understanding when each is justified separates production-aware candidates from those who default to bounding boxes for everything.

  • 2D bounding boxes: These rectangular annotations around objects are the fastest to produce and sufficient for single-camera detection training. However, they discard spatial depth entirely and degrade when objects overlap or are partially occluded. A bounding box around a pedestrian half-hidden behind a parked car includes significant background pixels, introducing noise into the training signal.

  • Segmentation masks: Instance segmentation masks are pixel-level annotations that delineate the exact boundary of each individual object instance in a frame, distinguishing overlapping objects of the same class. These provide precise object boundaries critical for occlusion reasoning. A segmentation mask cleanly separates the visible portion of that partially hidden pedestrian from the car in front. The trade-off is cost, typically 5–10x more expensive per frame than bounding boxes because annotators must trace every pixel boundary.

  • 3D point cloud annotations: Generated from LiDAR sensor data, these encode full spatial depth and enable sensor fusion with camera imagery. They provide ground-truth 3D bounding boxes essential for distance estimation and trajectory prediction. The cost ...