NVIDIA Jetson Community Project Spotlight: Point-Voxel CNN for Efficient 3D Deep Learning

3D deep learning is used in a variety of applications including robotics, AR/VR systems, and autonomous machines.

In this month’s Jetson Community Project spotlight, researchers from MIT’s Han Lab developed an efficient, 3D, deep learning method for 3D object segmentation, designed to run on edge devices.

“We present Point-Voxel CNN (PVCNN) for efficient, fast 3D deep learning. Previous work processes 3D data using either voxel-based or point-based neural network models. However, both approaches are computationally inefficient,” the researchers explained in their paper, Point-Voxel CNN for Efficient 3D Deep Learning, presented at NeurIPS 2019.

The model takes a different approach than previous state-of-the-art methods by interpreting 3D input data like point clouds, which helps reduce the overall memory footprint.

PVCNN can run on an NVIDIA Jetson Nano, with the cuDNN-accelerated PyTorch deep learning framework, at 20 objects per second. That’s 12x higher than the previous state-of-the-art model, PointNet, which runs 8 per second.

On a Jetson AGX Xavier, the network takes just 2.7 seconds to process more than one million points, while the PointNet model takes 4.1 seconds.

“Evaluated on semantic and part segmentation datasets, [PVCNN] achieves much higher accuracy than the voxel-based baseline with 10x GPU memory reduction; it also outperforms the state-of-the-art point-based models with 7x measured speedup on average.”

The team also showed how PVCNN can be used on an autonomous racing vehicle.

“Extensive experiments on multiple tasks consistently demonstrate the effectiveness and efficiency of our proposed method,” the researchers said. “We believe that our research will break the stereotype that the voxel-based convolution is naturally inefficient and shed light on co-designing the voxel-based and point-based network architectures.”

The researchers have published a PyTorch implementation of their code on GitHub.