NVIDIA CUDA-X AI are deep learning libraries for researchers and software developers to build high performance GPU-accelerated applications for conversational AI, recommendation systems and computer vision. CUDA-X AI libraries deliver world leading performance for both training and inference across industry benchmarks such as MLPerf.
Learn what’s new in the latest releases of CUDA-X AI libraries and NGC.
Refer to each package’s release notes in documentation for additional information.
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. This version of cuDNN includes:
- Support for BFloat16 for CNNs on NVIDIA Ampere architecture GPUs.
- New and easy C++ front-end API available in open source, wraps flexible v8 backend C API.
- Flexibly fuse operators such as convolutions, point-wise operations and reductions to speed up CNNs.
- New optimizations for computer vision, speech, and natural language understanding networks.
NVIDIA TensorRT is a platform for high-performance deep learning inference. This version of TensorRT includes:
- New debugging APIs – ONNX Graphsurgeon, Polygraphy, and Pytorch Quantization toolkit
- Support for Python 3.8
In addition this version includes several bug fixes and documentation upgrades.
Triton Inference Server 2.6
Triton is an open source inference serving software designed to maximize performance and simplify production deployment at scale. This version of Triton includes:
- Alpha version of Windows build, which supports gRPC and TensorRT backend
- Initial release of Model Analyzer, which is a tool that helps users select the optimal model configuration that maximizes performance in Triton.
- Support for Ubuntu 20.04 – Triton provides support for the latest version of Ubuntu, which comes with additional security updates.
- Native support in DeepStream – Triton on DeepStream can run inference on video analytics workflows on the edge or on the cloud with Kubernetes.
NGC Container Registry
NGC, the hub for GPU-optimized AI/ML/HPC application containers, models and SDKs that simplifies software development and deployment so users can achieve faster time to solution. This month’s updates include:
- NGC catalog in the AWS Marketplace – users can now pull the software directly from the AWS portal
- Containers for latest versions of NVIDIA AI software including Triton Inference Server, TensorRT, and deep learning frameworks such as PyTorch
The NVIDIA Data Loading Library (DALI) is a portable, open-source GPU-accelerated library for decoding and augmenting images and videos to accelerate deep learning applications. This version of DALI includes:
- New easy to use functional API. This is an experimental feature.
- DALI integration with Triton Inference Server to accelerate inference pipelines. DALI pipelines can now be run within Triton on the server side.
- New Jupyter Notebooks: Geometric Transform and Reductions
- New and improved operators for 3D/volumetric data and video processing.
nvJPEG2000 is a new library for GPU-accelerated JPEG2000 image decoding. This version of nvJPEG2000 includes:
- Support for Linux and Windows operating systems
- Up to 4x faster lossless decoding for 5-3 wavelet decoding and upto 7x faster lossy decoding for 9-7 wavelet transform
- Bitstreams with multiple tiles are now supported.