Fujitsu Breaks ImageNet Record with V100 Tensor Core GPUs

Researchers from Fujitsu just announced a new speed record for training ImageNet to 75% accuracy in 74.7 seconds. The new record is faster than the previous test by more than 47 seconds achieved by Sony in November of last year.

The team achieved the record by using 2,048 NVIDIA Tesla V100 GPUs, and the MXNet deep learning framework, at the AI Bridging Cloud Infrastructure system at the University of Tokyo in Japan.

The ABCI system is Japan’s fastest supercomputer and is among the top 10 fastest supercomputers in the world. The system is powered by over 4300 NVIDIA V100 GPUs connected via NVLink. This is the same system used by Sony in the previous record.

“Based on the technology Fujitsu Laboratories has cultivated over its HPC development, the company has now developed a technology to expand computation volume per GPU without compromising training accuracy,” the company wrote in a post

To compensate for drops in validation accuracy that can occur from training DNNs with large mini-batch, the team “used several techniques to increase mini-batch size, without compromising validation accuracy,” the team said.

“Distributed deep learning with data parallelism is known to be an effective approach to accelerate the training on clusters,” the researchers said.  “In this approach, all processes launched on the cluster have the same DNN model and weights.”

The researchers also used Tensor Cores for mixed precision.

The optimized DNN framework achieved completing the ResNet-50 training on ImageNet in 74.7 seconds with 75.08% validation accuracy.

The team was able to use a very large mini-batch size of 81,920 and maintain an accuracy of 75.08% (shown as the third data point on the above graph).

A number of NVIDIA technologies were used to achieve this milestone, including Layer-wise Adaptive Rate Scaling (LARS)

The work was recently published on ArXiv and on a Fujitsu blog.
Read more here.