RESTful Inference with the TensorRT Container and NVIDIA GPU Cloud

Once you have built, trained, tweaked and tuned your deep learning model, you need an inference solution that you need to deploy to a datacenter or to the cloud, and you need to get the maximum possible performance. You may have heard that NVIDIA TensorRT can maximize inference performance on NVIDIA GPUs, but how do you get from your trained model to a TensorRT-based inference engine in your datacenter or in the cloud? The new TensorRT container can help you solve this problem.

Based on NVIDIA Docker, the TensorRT container encapsulates all the libraries, executables and drivers you need to develop a TensorRT-based inference application. In just a few minutes you can go from nothing to having a local development environment for your inference solution that can also act as the basis for your own container-based datacenter or cloud deployment.

A new NVIDIA Developer Blog post introduces the TensorRT container and describes the simple REST server included in the container, which can act as a basis or inspiration for your own deployment solution.