In this post, we detail the exciting new features of the A100 that make NVIDIA GPUs an ever-better powerhouse for computer vision workloads. We also showcase two recent CV research projects from NVIDIA Research, Hierarchical Multi-Scale Attention for Semantic Segmentation and Bi3D: Stereo Depth Estimation via Binary Classifications, and show how they benefit from the A100.
The NVIDIA A100 is the largest 7nm chip ever made with 54B transistors, 40 GB of HBM2 GPU memory with 1.5 TB/s of GPU memory bandwidth. The A100 offers up to 624 TF of FP16 arithmetic throughput for deep learning (DL) training, and up to 1,248 TOPS of INT8 arithmetic throughput for DL inference. At a high level, the NVIDIA A100 is packed with a suite of exciting new features:
- Multi-Instance GPU (MIG) allows the A100 Tensor Core GPU to be securely partitioned into as many as seven separate GPU instances for CUDA applications
- Third-generation Tensor Cores with TensorFloat 32 (TF32) instructions which accelerate processing of FP32 data
- Third-generation NVLink at 10X the interconnect speed of PCIe gen 4
- For CV workloads, the number of video decoders in the A100 went up dramatically to five compared to one video decoder on the V100. It also includes five new hardware JPEG decoder engines and new improved hardware for optical flow.
Read the full post, Improving Computer Vision with NVIDIA A100 GPUs, on the NVIDIA Developer Blog.