Developer Blog: Accelerating Apache Spark 3.0 with GPUs and RAPIDS

In this post, the team shows how the new RAPIDS Accelerator for Apache Spark enables GPU acceleration of end-to-end data analytic pipelines, Spark SQL operations, and Spark shuffle operations.

Given the parallel nature of many data processing tasks, it’s only natural that the massively parallel architecture of a GPU should be able to parallelize and accelerate Apache Spark data processing queries, in the same way that a GPU accelerates deep learning (DL) in artificial intelligence (AI).

NVIDIA has worked with the Apache Spark community to implement GPU acceleration through the release of Spark 3.0 and the open source RAPIDS Accelerator for Spark. In this post, we dive into how the RAPIDS Accelerator for Apache Spark uses GPUs to:

  • Accelerate end-to-end data preparation and model training on the same Spark cluster.
  • Accelerate Spark SQL and DataFrame operations without requiring any code changes.
  • Accelerate data transfer performance across nodes (Spark shuffles).

Read the full Developer Blog, Accelerating Apache Spark 3.0 with GPUs and RAPIDS.