Researchers from fast.ai announced a new speed record for training ImageNet to 93 percent accuracy in only 18 minutes.
Fast.ai alumni Andrew Shaw, and Defense Innovation Unit Experimental (DIU) researcher Yaroslav Bulatov achieved the speed record using 128 NVIDIA Tesla V100 Tensor Core GPUs on the Amazon Web Services (AWS) cloud, with the fastai and cuDNN-accelerated PyTorch libraries. For distributed computation, the team used the NVIDIA Collective Communications Library (NCCL) open-source library, which implements ring-style collectives that are integrated with PyTorch’s all-reduce distributed module.
The record is 40% faster than the previous record.
“DIU and fast.ai will be releasing software to allow anyone to easily train and monitor their own distributed models on AWS, using the best practices developed in this project,” said Jeremy Howard, a founding researcher at fast.ai. “We entered this competition because we wanted to show that you don’t have to have huge resources to be at the cutting edge of AI research, and we were quite successful in doing so.”
The researchers said they were encouraged by previous speed records achieved on publicly available machines by the AWS team.
“The set of tools developed by fast.ai focused on fast iteration with single-instance experiments, whilst the nexus-scheduler developed by DIU was focused on robustness and multi-machine experiments,” Howard stated.
The team says they achieved the speed record with 16 AWS instances, at a total compute cost of $40.
“We’re not even done yet – we have some ideas for further simple optimizations which we’ll be trying out,” Howard said. “There’s certainly plenty of room to go faster still.”
You can learn more about the record and fast.ai’s implementation on their blog.