Learn what’s new in the latest releases of NVIDIA’s CUDA-X AI libraries and NGC. Refer to each package’s release notes in documentation for additional information.
NVIDIA released an open source project to deliver GPU-accelerated TensorFlow 1.x that is optimized for A100, V100 and T4 GPUs. This release is based on TensorFlow 1.15. With this version you get:
- Latest features in CUDA 11
- Optimizations from libraries such as cuDNN 8
- Enhancements for XLA:GPU, AMP and Tensorflow-TensorRT integration
TensorRT 7.1 is Now Available
NVIDIA TensorRT is an SDK for high-performance deep learning inference that minimizes latency and maximizes throughput in production. In addition to bug fixes and minor updates, this version includes:
- Support for the latest A100 GPUs
- INT8 precision optimizations for BERT delivering 6x higher performance than V100 GPUs
cuDNN 8 GA is Now Available
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. This version of cuDNN includes:
- Tuned for peak performance on NVIDIA A100 GPUs including new TensorFloat-32, FP16, and FP32
- Redesigned low-level API provides direct access to cuDNN kernels for greater control and performance tuning
- New optimizations for computer vision, speech, and language understanding networks
- Fuse operators to accelerate convolutional neural networks with a new API
NVIDIA NeMo 0.11
NVIDIA NeMo is an open-source toolkit to build, train, and fine-tune GPU-accelerated state-of-the-art conversational AI models easily through API compatible modules. This version of NeMo includes:
- Neural Graphs to flexibly save and load GPU-optimized NeMo modules and configurations
- New speech-based pre-trained models and collections for Voice Activity Detection(VAD) and Matchboxnet that can speed-up training fine-tuning tasks up to 3x.
- Introduced SOTA NLP use cases and collections for BioBERT and MegatronBERT
The NVIDIA Data Loading Library (DALI) is a portable, open-source GPU-accelerated library for decoding and augmenting images and videos to accelerate deep learning applications. This version of DALI includes:
- Support for A100 GPUs achieving over 2x speedup using JPEG hardware decoder
- New audio processing operators to accelerate ASR pipelines
- New Jupyter notebooks demonstrating how to load and decode audio data, and perform audio feature extraction (LINK)
NGC Updates (Includes framework updates)
NGC provides containers, models and scripts with the latest performance enhancements. This month’s updates include:
- 20.06 deep learning framework container releases for PyTorch, TensorFlow and MXNet are the first releases to support the latest NVIDIA A100 GPUs and latest CUDA 11 and cuDNN 8 libraries. TF32, a new precision is available by default in the containers and provides up to 6X performance improvement out of the box for Deep Learning training when compared to V100 FP32.
- Starting with 20.06, the PyTorch containers have support for torch.cuda.amp, the mixed precision functionality available in Pytorch core as the AMP package. Compared to apex.amp, torch.cuda.amp is more flexible and intuitive. More details can be found in this blog from PyTorch.