Fast.AI Breaks ImageNet Record with NVIDIA V100 Tensor Core GPUs

Researchers from fast.ai announced a new speed record for training ImageNet to 93 percent accuracy in only 18 minutes.
Fast.ai alumni Andrew Shaw, and Defense Innovation Unit Experimental (DIU) researcher Yaroslav Bulatov achieved the speed record using 128 NVIDIA Tesla V100 Tensor Core GPUs on the Amazon Web Services (AWS) cloud, with the fastai and cuDNN-accelerated PyTorch libraries. For distributed computation, the team used the NVIDIA Collective Communications Library (NCCL) open-source library, which implements ring-style collectives that are integrated with PyTorch’s all-reduce distributed module.
The record is 40% faster than the previous record.
“DIU and fast.ai will be releasing software to allow anyone to easily train and monitor their own distributed models on AWS, using the best practices developed in this project,” said Jeremy Howard, a founding researcher at fast.ai. “We entered this competition because we wanted to show that you don’t have to have huge resources to be at the cutting edge of AI research, and we were quite successful in doing so.”
The researchers said they were encouraged by previous speed records achieved on publicly available machines by the AWS team.
“The set of tools developed by fast.ai focused on fast iteration with single-instance experiments, whilst the nexus-scheduler developed by DIU was focused on robustness and multi-machine experiments,” Howard stated.

A snippet of the Jupyter Notebook comparing different cropping approaches. ‘Center Crop Image’ is the original photo, ‘FastAi rectangular’ is our new method, ‘Imagenet Center’ is the standard approach, and ‘Test Time Augmentation’ is an example from the multi-crop approach.

The team says they achieved the speed record with 16 AWS instances, at a total compute cost of $40.
“We’re not even done yet – we have some ideas for further simple optimizations which we’ll be trying out,” Howard said. “There’s certainly plenty of room to go faster still.”
You can learn more about the record and fast.ai’s implementation on their blog.
Read more >

Fast.AI Breaks ImageNet Record with NVIDIA V100 Tensor Core GPUs

Related resources

Tags

About the Authors

Fast.AI Breaks ImageNet Record with NVIDIA V100 Tensor Core GPUs

Related resources

Tags

About the Authors

Comments

Related posts

Fujitsu Breaks ImageNet Record with V100 Tensor Core GPUs

SONY Breaks ResNet-50 Training Record with NVIDIA V100 Tensor Core GPUs

Speeding Up Deep Learning Training with NVIDIA V100 Tensor Core GPUs in the AWS Cloud

Volta Tensor Core GPU Achieves New AI Performance Milestones

Facebook Trains ImageNet in 1 Hour

Related posts

New Video Series: OpenUSD for Developers

Generative AI for Digital Humans and New AI-powered NVIDIA RTX Lighting

NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy

Boost Multi-Omics Analysis with GPU-Acceleration and Generative AI

Breaking Barriers in Healthcare with New Models for Generative AI and Cellular Imaging