AI Co-Pilot: RNNs for Dynamic Facial Analysis

NVIDIA AI Co-Pilot combines deep learning and visual computing to enable augmented driving — using sensor data from a microphone inside the car, and interior/exterior cameras to track the environment around the driver. Co-Pilot understands where you are looking while driving to determine objects you might not see — like what’s coming up along side you, ahead of you and in your blind spot. This lets the car understand the driver as well as its environment and provides suggestions, warnings and, where needed, interventions for safer and more enjoyable experience. A key component of AI Co-Pilot is the technology for continuous real-time monitoring of the driver’s posture and gaze.

Below is a video from NVIDIA CEO Jen-Hsung Huang’s CES 2017 keynote introducing AI Co-Pilot.

Estimating facial features such as head pose and facial landmarks from images is key for many applications, including activity recognition, human-computer interaction, and facial motion capture. While most prior work has focused on facial feature estimation from a single image, videos provide temporal links among nearby image frames, which is essential for accurate and robust estimation. A key challenge for video-based facial analysis is to properly exploit temporal coherence.

A new post on the NVIDIA Developer blog describes the use of  recurrent neural networks (RNNs) for joint estimation and tracking of facial features in videos. As a generic and learning-based approach for time series prediction, RNNs avoid manual tracker engineering for tasks performed on videos, much like CNNs (Convolutional Neural Networks) avoid manual feature engineering for tasks performed on still images.

NVIDIA recently incorporated RNN-based facial analysis into the AI Co-Pilot platform to improve the overall driving experience, and published this work in IEEE CVPR 2017.

Read more >