Google Open-Sources Image Captioning Intelligence

Google released the latest version of their automatic image captioning model that is more accurate, and is much faster to train compared to the original system.
“The TensorFlow implementation released today achieves the same level of accuracy with significantly faster performance: time per training step is just 0.7 seconds in TensorFlow compared to 3 seconds in DistBelief (a system Google previously used for generating image captions) on an NVIDIA K20 GPU, meaning that total training time is just 25 percent of the time previously required,” Chris Shallue, Software Engineer of the Google Brain Team wrote in a blog post.
Using CUDA and the TensorFlow deep learning framework, Google trains Show and Tell by letting it take a look at images and captions that people wrote for those images. Sometimes, if the model thinks it sees something going on in a new image that’s exactly like a previous image it has seen, it falls back on the caption for the caption for that previous image. But at other times, Show and Tell is able to come up with original captions. “Moreover,” Shallue wrote, “it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.”

google-show-and-tell-caption3c — The model generates a completely new caption using concepts learned from similar scenes in the training set.

The initial training phase took nearly two weeks on a single Tesla K20 GPU, but they mention it would be 10 times slower if you were to run the code on a CPU.
Read more >

Google Open-Sources Image Captioning Intelligence

Related resources

Tags

About the Authors

Google Open-Sources Image Captioning Intelligence

Related resources

Tags

About the Authors

Comments

Related posts

Breaking MLPerf Training Records with NVIDIA H100 GPUs

NVIDIA Releases Updates to CUDA-X AI Software

Announcing Megatron for Training Trillion Parameter Models and NVIDIA Riva Availability

New AI Technologies Introduced at GTC 2020 Keynote

NVIDIA Slashes BERT Training and Inference Times

Related posts

Just Released: NVIDIA Modulus v24.04

New Video Series: OpenUSD for Developers

Generative AI for Digital Humans and New AI-powered NVIDIA RTX Lighting

NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy

Breaking Barriers in Healthcare with New Models for Generative AI and Cellular Imaging