Accelerating Apache Spark 3.0 with GPUs and RAPIDS

Given the parallel nature of many data processing tasks, it’s only natural that the massively parallel architecture of a GPU should be able to parallelize and accelerate Apache Spark data processing queries, in the same way that a GPU accelerates deep learning (DL) in artificial intelligence (AI). NVIDIA has worked with the Apache Spark community … Continued

Accelerating Single Cell Genomic Analysis using RAPIDS

The human body is made up of nearly 40 trillion cells, of many different types. Recent advances in experimental biology have made it possible to explore the genetic material of single cells. With the birth of this new field of single-cell genomics, scientists can now probe the DNA and RNA of individual cells in the … Continued

Building an Accelerated Data Science Ecosystem: RAPIDS Hits Two Years

GTC Fall 2020 marked the second anniversary of the initial release of RAPIDS. Created out of the GPU Open Analytics Initiative (GoAi) aimed at making accelerated, end-to-end analytics on GPUs easy, RAPIDS has proven GPUs are performant, easy to use, and transformative to the future of data analytics. By thinking about the relationship between software … Continued

Zero to Data Science: Making Data Science Teams Productive with Kubernetes and RAPIDS

  Data collected on a vast scale has fundamentally changed the way organizations do business, driving demand for teams to provide meaningful data science, machine learning, and deep learning-based business insights quickly. Data science leaders, plus the Dev Ops and IT teams supporting them, constantly look for ways to make their teams productive while optimizing their costs … Continued

RAPIDS Accelerates Data Science End-to-End

At GTC Europe in Munich Germany, NVIDIA announced RAPIDS, a suite of open-source software libraries for executing end-to-end data science and analytics pipelines entirely on GPUs.  RAPIDS aims to accelerate the entire data science pipeline including data loading, ETL, model training, and inference. This will enable more productive, interactive, and exploratory workflows. The RAPIDS libraries … Continued

RAPIDS Accelerates Data Science End-to-End

Today’s data science problems demand a dramatic increase in the scale of data as well as the computational power required to process it. Unfortunately, the end of Moore’s law means that handling large data sizes in today’s data science ecosystem requires scaling out to many CPU nodes, which brings its own problems of communication bottlenecks, energy, and … Continued

Fast, Flexible Allocation for NVIDIA CUDA with RAPIDS Memory Manager

When I joined the RAPIDS team in 2018, NVIDIA CUDA device memory allocation was a performance problem. RAPIDS cuDF allocates and deallocates memory at high frequency, because its APIs generally create new Series and DataFrames rather than modifying them in place. The overhead of cudaMalloc and synchronization of cudaFree was holding RAPIDS back. My first … Continued

Using RAPIDS with PyTorch

In this post we take a look at how to use cuDF, the RAPIDS dataframe library, to do some of the preprocessing steps required to get the mortgage data in a format that PyTorch can process so that we can explore the performance of deep learning on tabular data and compare it to the xgboost … Continued

Running Python UDFs in Native NVIDIA CUDA Kernels with the RAPIDS cuDF

In this post, I introduce a design and implementation of a framework within RAPIDS cuDF that enables compiling Python user-defined functions (UDF) and inlining them into native CUDA kernels. This framework uses the Numba Python compiler and Jitify CUDA just-in-time (JIT) compilation library to provide cuDF users the flexibility of Python with the performance of … Continued

Using the RAPIDS VM Image for Google Cloud Platform

NVIDIA’s Ty McKercher and Google’s Viacheslav Kovalevskyi and Gonzalo Gasca Meza jointly authored a post on using the new the RAPIDS VM Image for Google Cloud Platform. Following is a short summary. For the full post, please see the full Google article. If you’re a data scientist, researcher, engineer, or developer using pandas, Dask, scikit-learn, … Continued

Speedy Model Training With RAPIDS + Determined AI

Model developers no longer face a steep learning curve to accelerate model training. By utilizing two open-source software projects, Determined AI’s Deep Learning Training Platform and the RAPIDS accelerated data science toolkit, they can easily achieve up to 10x speedups in data preprocessing and train models at scale.  Making GPUs accessible As the field of … Continued