Scaling TensorFlow and Caffe to 256 GPUs

IBM Research unveiled a “Distributed Deep Learning” (DDL) library that enables cuDNN-accelerated deep learning frameworks like TensorFlow, Caffe, Torch and Chainer to scale to tens of IBM servers leveraging hundreds of GPUs.

“With the DDL library, it took us just 7 hours to train ImageNet-22K using ResNet-101 on 64 IBM Power Systems servers that have a total of 256 NVIDIA P100 GPU accelerators in them,” mentioned Sumit Gupta, VP, HPC, AI & Machine Learning at IBM Cognitive Systems. “16 days down to 7 hours changes the workflow of data scientists. That’s a 58x speedup!”

According to the researcher’s paper,  the team achieved deep learning records in image recognition accuracy and training times when using the new library and 256 GPUs.

A technical preview of DDL is available in version 4 of IBM’s PowerAI enterprise deep learning software, which makes this cluster scaling feature available to any organization using deep learning for training their AI models.

Read more >

About Brad Nemire

Brad Nemire
Brad Nemire is on the Developer Marketing team and loves reading about all of the fascinating research being done by developers using NVIDIA GPUs. Reach out to Brad on Twitter @BradNemire and let him know how you’re using GPUs to accelerate your research. Brad graduated from San Diego State University and currently resides in San Jose, CA. Follow @BradNemire on Twitter