Latest Updates to NVIDIA CUDA-X Libraries

Learn what’s new in the latest releases of NVIDIA’s CUDA-X Libraries and NGC.

Neural Modules

NVIDIA Neural Modules is a new open-source toolkit for researchers to build state-of-the-art neural networks for AI accelerated speech applications. Early release of the toolkit includes:

  • Base modules for automatic speech recognition and natural language processing
  • GPU acceleration with mixed precision and multi-node distributed training
  • PyTorch support

Download Now

TensorRT 6

NVIDIA TensorRT is a platform for high-performance deep learning inference. This version of TensorRT includes:

  • BERT-Large inference in 5.8 ms on T4 GPUs
  • Dynamic shaped inputs to accelerate conversational AI, speech, and image segmentation apps
  • Dynamic input batch sizes help speed up online apps with fluctuating workloads
  • New layers accelerate 3D image segmentation in healthcare apps
  • Optimizations in 2D image segmentation for industrial defect inspection

Get started with new Jupyter notebooks:

Download Now

cuDNN 7.6

NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. This version of cuDNN includes:

  • Tensor Core accelerated 3D convolutions for VNet and UNet-3D models
  • Tensor Core acceleration for multi-head attention forward training and inference
  • Auto-padding for TensorFlow NHWC layout for faster kernel launch times

Download Now

NGC Updates

NGC provides containers, models and scripts with the latest performance enhancements. This month’s updates include:

GPU Optimized ASR and NLP Pipelines

Deep Learning Framework Updates

  • Native Automatic Mixed Precision support in TensorFlow 2.0 and MXNet 1.5
  • Additional support in PyTorch and MXNet for 3D convolutions, grouped convolutions, and depthwise separable

TensorRT Inference Server

NVIDIA TensorRT Inference Server is an open source inference microservice that lets you serve deep learning models in production while maximizing GPU utilization. This version of TensorRT Inference Server includes:

  • Deploy native PyTorch models without extra conversion
  • Deploy native ONNX models without extra conversion
  • Model Control API for dynamic model unloading/loading 
  • Host models in AWS S3 model repository
  • C++ library version of TensorRT Inference Server to bypass gPRC/HTTP interface 
  • Store inputs and outputs locally in shared memory to reduced memory overhead

Download Now

Refer to each package’s release notes for additional information.