Scaling Keras Model Training to Multiple GPUs

Keras is a powerful deep learning meta-framework which sits on top of existing frameworks such as TensorFlow and Theano. Keras is highly productive for developers; it often requires 50% less code to define a model than native APIs of deep learning frameworks require. This productivity has made it very popular as a university and MOOC teaching tool, and as a rapid prototyping platform for applied researchers and developers.

Unfortunately, Keras is quite slow in terms of single-GPU training and inference time (regardless of the backend). It is also hard to get it to work on multiple GPUs without breaking its framework-independent abstraction.

Can this be improved, leveraging Keras’s high-level API, while still achieving good single-GPU performance and multi-GPU scaling? It turns out that the answer is yes, thanks to the MXNet backend for Keras, and MXNet’s efficient data pipeline. Last week, the MXNet community introduced a release candidate for MXNet v0.11.0 with support for Keras v1.2.

ResNet-50 training throughput (images per second) comparing Keras using the MXNet backend (green bars) to a native MXNet implementation (blue bars).

In a new NVIDIA Developer Blog post, Marek Kolodziej shows how to use Keras with the MXNet backend to achieve high performance and excellent multi-GPU scaling. As a motivating example, I’ll show you how to build a fast and scalable ResNet-50 model in Keras.

Read more >