Facebook Trains ImageNet in 1 Hour

Facebook published a paper today detailing how they are able to train nearly 1.3 million images in under an hour using 256 Tesla P100 GPUs that previously took days on a single system.

The team reduced the training time of a ResNet-50 deep learning model on ImageNet from 29 hours to one – which they did so by distributing training in larger minibatches across more GPUs. Previously, batches of 256 images were spread across eight Tesla P100 GPUs, but today’s work shows the same level of accuracy when training with large minibatch sizes up to 8,192 images distributed across 256 GPUs.
According to the paper, “to achieve this result, we adopt a linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training.” They were able to achieve near-linear SGD scaling by using an optimized allreduce implementation. For the local reduction, they used NVIDIA Collective Communications Library (NCCL) that implements multi-GPU collective communication primitives that are performance optimized for NVIDIA GPUs.

ImageNet top-1 validation error vs. minibatch size. Error range of plus/minus two standard deviations is shown.

Facebook used the open source deep learning framework Caffe2 and their Big Basin GPU server that has eight NVIDIA Tesla P100 GPU accelerators that are interconnected using NVIDIA NVLink.
Read more >

Facebook Trains ImageNet in 1 Hour

Related resources

Tags

About the Authors

Facebook Trains ImageNet in 1 Hour

Related resources

Tags

About the Authors

Comments

Related posts

Fujitsu Breaks ImageNet Record with V100 Tensor Core GPUs

SONY Breaks ResNet-50 Training Record with NVIDIA V100 Tensor Core GPUs

Speeding Up Deep Learning Training with NVIDIA V100 Tensor Core GPUs in the AWS Cloud

White Paper: NVIDIA DGX-1 with Tesla V100

Scaling TensorFlow and Caffe to 256 GPUs

Related posts

Dynamic Control Flow in CUDA Graphs with Conditional Nodes

An Introduction to Quantum Accelerated Supercomputing

cuTENSOR 2.0: Applications and Performance

cuTENSOR 2.0: A Comprehensive Guide for Accelerating Tensor Computations

NVIDIA CUDA-Q Introduces More Capabilities for Quantum Accelerated Supercomputing