Ampere Tensor Cores introduce a novel math mode dedicated for AI training: the TensorFloat-32 (TF32). TF32 is designed to accelerate the processing of FP32 data types, commonly used in DL workloads. On NVIDIA A100 Tensor Cores, the throughput of mathematical operations running in TF32 format is up to 10x more than FP32 running on the prior Volta-generation V100 GPU, resulting in up to 5.7x higher performance for DL workloads.
Every month, NVIDIA releases containers for DL frameworks on NVIDIA NGC, all optimized for NVIDIA GPUs: TensorFlow 1, TensorFlow 2, PyTorch, and “NVIDIA Optimized Deep Learning Framework, powered by Apache MXNet”. Starting from the 20.06 release, we have added support for the new NVIDIA A100 features, new CUDA 11 and cuDNN 8 libraries in all the deep learning framework containers.
In this post, we focus on TensorFlow 1.15–based containers and pip wheels with support for NVIDIA GPUs, including the A100. We continue to release NVIDIA TensorFlow 1.15 every month to support the significant number of NVIDIA customers who are still using TensorFlow 1.x.
Read the full blog, Accelerating TensorFlow on NVIDIA A100 GPUs, on the NVIDIA Developer Blog.