Scaling Keras Model Training to Multiple GPUs

Keras is a powerful deep learning meta-framework which sits on top of existing frameworks such as TensorFlow and Theano. Keras is highly productive for developers; it often requires 50% less code to define a model than native APIs of deep learning frameworks require. This productivity has made it very popular as a university and MOOC teaching tool, and as a rapid prototyping platform for applied researchers and developers.

Unfortunately, Keras is quite slow in terms of single-GPU training and inference time (regardless of the backend). It is also hard to get it to work on multiple GPUs without breaking its framework-independent abstraction.

Can this be improved, leveraging Keras’s high-level API, while still achieving good single-GPU performance and multi-GPU scaling? It turns out that the answer is yes, thanks to the MXNet backend for Keras, and MXNet’s efficient data pipeline. Last week, the MXNet community introduced a release candidate for MXNet v0.11.0 with support for Keras v1.2.

ResNet-50 training throughput (images per second) comparing Keras using the MXNet backend (green bars) to a native MXNet implementation (blue bars).

In a new NVIDIA Developer Blog post, Marek Kolodziej shows how to use Keras with the MXNet backend to achieve high performance and excellent multi-GPU scaling. As a motivating example, I’ll show you how to build a fast and scalable ResNet-50 model in Keras.

Read more >

About Brad Nemire

Brad Nemire
Brad Nemire is on the Developer Marketing team and loves reading about all of the fascinating research being done by developers using NVIDIA GPUs. Reach out to Brad on Twitter @BradNemire and let him know how you’re using GPUs to accelerate your research. Brad graduated from San Diego State University and currently resides in San Jose, CA. Follow @BradNemire on Twitter