NVIDIA SDK Updated With New Releases of TensorRT, CUDA, and More

At NIPS 2017, NVIDIA announced new software releases for deep learning and HPC developers.  The latest SDK updates include new capabilities and performance optimizations to TensorRT, CUDA toolkit and the new project CUTLASS library.

Here’s a detailed look at each of the software updates and the benefits they bring to developers and end users:

TensorRT 3

TensorRT 3 production release is now available as a free download to all members of the NVIDIA Developer Program. Highlights from this release include:

  • Deliver up to 3.7x faster inference on Tesla V100 vs. Tesla P100 under 7ms real-time latency
  • Optimize and deploy TensorFlow models up to 18x faster compared to TensorFlow framework inference on Tesla V100
  • Improved productivity with easy-to-use Python API

Technical blog:

TensorRT Performance whitepaper:

Learn more and download TensorRT >>

TensorRT Container on NVIDIA GPU Cloud (NGC)

New TensorRT inference container on NGC with the latest TensorRT 3 release, sample REST server for cloud inference, and sample Open Neural Network Exchange (ONNX) model parser.

Sign up for an NGC account to get free access to the TensorRT container for your desktop with a TITAN GPU or for NVIDIA Volta-enabled P3 instances on Amazon EC2.

Technical blog:


CUDA Templates for Linear Algebra Subroutines or CUTLASS is a CUDA C++ template library that offers a high-level interface and building blocks for implementing fast and efficient GEMM (GEneral Matrix Multiplication) operations for HPC and deep learning applications. CUTLASS is available as an open source project on GitHub. It remains under development and is open sourced for feedback and testing, and is not ready for use in production.

Technical blog:

CUDA 9.1

Available later this month, CUDA 9.1 will bring new algorithms and optimizations that speed up AI and HPC apps on Volta GPUs. Highlights include:

  • Develop image augmentation algorithms for deep learning easily with new functions in NVIDIA Performance Primitives
  • Run batched neural machine translations and sequence modeling operations on Volta Tensor cores using new APIs in cuBLAS
  • Solve large 2D and 3D FFT problems more efficiently on multi-GPU systems with new heuristics in cuFFT
  • Launch CUDA kernels up to 12x faster with new performance optimizations

Register for the NVIDIA Developer Program to be notified when CUDA 9.1 is available for download >>

JetPack 3.2

JetPack 3.2 Developer Preview is now available. Through our update to TensorRT 3.0, we’re now adding support for TensorFlow models. This delivers up to 15% performance per Watt improvements for deep learning applications. In addition, the new L4T kernel supports Docker, while JetPack now enables Ubuntu 16.04 on your host PC.

Download JetPack 3.2 >>

About Brad Nemire

Brad Nemire
Brad Nemire is on the Developer Marketing team and loves reading about all of the fascinating research being done by developers using NVIDIA GPUs. Reach out to Brad on Twitter @BradNemire and let him know how you’re using GPUs to accelerate your research. Brad graduated from San Diego State University and currently resides in San Jose, CA. Follow @BradNemire on Twitter