Neural Machine Translation Now Available with TensorRT

NVIDIA released TensorRT 4 with new features to accelerate inference of neural machine translation (NMT) applications on GPUs. Neural machine translation offers AI-based text translation for large number of consumer applications, including web sites, road signs, generating subtitles in foreign languages, and more.
The new TensorRT 4 release brings support for new RNN layers such as Batch MatrixMultiply, Constant, Gather, RaggedSoftMax, Reduce, RNNv2, and TopK. These layers allow application developers to accelerate the most compute intensive portions of an NMT model easily with TensorRT.
In terms of performance, when beam search was tested on the data-writer-benchmark component, the system performed 170 times faster than a CPU-only during inference for batch=1 and over 100 times faster for batch size = 64.
TensorRT, NVIDIA’s programmable inference accelerator, helps optimize and generate runtime engines for deploying deep learning inference apps to production environments. The GNMT model performed inference 60x faster using TensorRT on Tesla V100 GPUs as compared to CPU-only platforms.

Neural Machine Translation Now Available with TensorRT

Related resources

Tags

About the Authors

Neural Machine Translation Now Available with TensorRT

Related resources

Tags

About the Authors

Comments

Related posts

NVIDIA Announces TensorRT 8.2 and Integrations with PyTorch and TensorFlow

TensorRT 5 RC Now Available

Neural Machine Translation Inference with TensorRT 4

TensorRT 4 Accelerates Neural Machine Translation, Recommenders, and Speech

NVIDIA Releases TensorRT 4

Related posts

NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy

Mastering LLM Techniques: Training

Elevate Enterprise Generative AI App Development with NVIDIA AI on Azure Machine Learning

Build Custom Enterprise-Grade Generative AI with NVIDIA AI Foundation Models

How to Deploy NVIDIA Riva Speech and Translation AI in the Public Cloud