This post is the first in a series on Autonomous Driving at Scale, developed with Tata Consultancy Services (TCS). In this post, we provide a general overview of the deep learning inference for object detection.
Autonomous vehicle software development requires data, computation, and algorithmic innovation on a massive scale only achieved by GPUs. An array of neural networks forms a base for the perception and decision-making systems. The neural network performance increases proportionally to the amount of data and requires infrastructure to support training and inference at scale.
For autonomous vehicles (AV) to reach acceptable levels of safety, they must be trained on massive amounts of real-world driving data encompassing the diversity of situations that a car could encounter daily. These training scenarios are collected by fleets of vehicles fitted with multiple sensors driving hours each day, generating petabytes of data. This data must then be annotated and processed for comprehensive AV development, testing, and validation.
A large part of AV software is the perception stack that enables the vehicle to detect, track, and classify objects and to estimate distances. A perception algorithm developer can create high performance and robust algorithms that are capable of accurately detecting other vehicles, lanes, static and moving objects, pedestrians, traffic lights at crossings, and intersections in any scenario. The scenarios include various ambient conditions, including inside a tunnel, on a pitch-dark highway, or in glaring sunlight. For these algorithms to work effectively, they need a steady influx of high-quality, annotated or labeled data to train.
In the case of object detection, the goal is not only to detect objects in a single frame but also to determine the location of the objects in the frame. The objects need to be identified, classified, and labeled in the right classes. You can achieve these objectives through a bounding box, which not only identifies the object but also determines the location of the object along with a confidence score.
Read the full blog on the NVIDIA Developer Blog.