DRIVE Labs: Visual Feature Tracking for Autonomous Vehicles

By: Yue Wu, Cheng-Chieh Yang, Xin Tong

Editor’s note: This is the latest post in our NVIDIA DRIVE Labs series, which takes an engineering-focused look at individual autonomous vehicle challenges and how NVIDIA DRIVE addresses them. Catch up on all of our automotive posts, here.

Feature tracking, the estimation of pixel-level correspondences and pixel-level changes among adjacent video frames, is key to providing critical temporal and geometric information for object motion/velocity estimation, camera self-calibration and visual odometry.

Accurate and stable feature tracks translate to accurate time-to-collision estimates in obstacle perception, robust calculations of camera sensors’ extrinsic calibration (pitch/yaw/roll) values, and a key visual input for generating a three dimensional world representation in visual odometry. Since feature tracking is based on pixel-level computations, a high performance computing platform is foundational to a practical implementation.

In this DRIVE Labs video, we’ll walk through how NVIDIA DRIVE Software approaches feature tracking for robust autonomous driving.

As the vehicle drives, pixel-level information can become distorted due to illumination changes, viewpoint changes and complexities associated with motion of non-rigid objects on the scene. In computer vision, there exist a few common algorithmic approaches: 1) feature tracking with dense optical flow; 2) feature tracking with sparse optical flow; and 3) deep learning-based methods.

Obtaining accurate, sufficiently diverse pixel-level correspondence training data for deep learning-based optical flow computation is non trivial, such that traditional computer vision methods offer up an important advantage here. To optimize trade-offs among accuracy, robustness and running efficiency for autonomous driving, we pursue the sparse optical flow-based feature tracking approach. Specifically, in our approach, instead of exhaustively computing optical flow for each pixel in the image (dense optical flow), we exploit computational advantages of sparsity and only compute optical flow for important feature points.

This approach comprises three major steps: 1) image preprocessing; 2) feature detection; and 3) feature tracking across frames.

The image preprocessing step extracts gradients information from the image. The feature detection step then uses this information to identify the salient feature points in the image, which can be robustly tracked across frames. Finally, the optical flow-based feature tracking step tracks the detected features and estimates their motion across adjacent frames in the video sequence.

Caption: Feature tracking algorithm running on six-camera surround perception setup, with feature tracks shown in blue.

For safe autonomous driving built on embedded computing platforms, it is crucial to balance accuracy and performance in real-time feature tracking. We’ve designed a sophisticated feature density control algorithm with this balance in mind, ensuring the detected sparse features cover the image regions that are most important for self-driving. Additionally, we leverage a coarse-to-fine feature tracking strategy to enhance computational speed and robustness.

We have enabled the sparse feature tracking to run simultaneously on multiple cameras. The feature tracker implementation for both front and surround camera perception configurations is available to developers starting with the NVIDIA DRIVE Software 9.0 release.