By Joshua Patterson

As I look back on release 0.13, I feel a fraction of how the astronauts on Apollo 13 must have felt: relieved to have landed and grateful for rooms of quick thinking engineers. Early space missions can teach you a lot about what to do when things don’t go according to plan. You have to keep a level head, follow a checklist and efficiently improvise solutions when plans A, B, and C didn’t work.

When we started building open source tools for GPU-accelerated data science, we had a vision of making the most advanced computing resources available to everyone. To quote a speech JFK made about the race to the moon, “We do these things not because they are easy, but because they are hard.” Let me tell you about some of the hard things the team did for this release to move our vision forward.

The Great cuDF++ Refactor Gets Official Lift-off

I mentioned in the 0.12 release blog that we kicked off “The Great cuDF++ Refactor”. We have made huge progress on this work for release 0.13. Nearly all the code base has been ported to use the libcudf++ API. That means we are on a firm foundation for multi-language, multi-platform RAPIDS for the foreseeable future. Such a huge refactor can lead to bugs, even when carefully undertaken by the best engineers. So if you notice any problems, please file GitHub issues so we can resolve these quickly.

Summary of RAPIDS Core Library Updates

Given everyone has more to think about these days given the current state of the world, we’re going to split our release blog into two parts. Many of you went from one to three plus jobs as you’re spending more time at home (and hopefully practicing social distancing), and reading a long release blog may not be top of mind. You can read the long version of all the new RAPIDS improvements on the RAPIDS Medium blog, or skim the summary below by library.

Dataframes: cuDF

Python is now hooked up to the “Great libcudf++ Port”
Expanded groupby aggregations including: `median`, `nunique`, `nth` and `std`
Additional join methods: semi-joins and anti-joins
`concatenate` optimizations giving up to 2000x speedups
Distributed multi column sorting and multi column hash partitioning

Machine Learning: cuML and XGBoost

XGBoost 1.0 launch – including new Dask API
New configurable input and output types for estimators
Multi-node, multi-GPU support for linear models (PCA, tSVD, OLS, Ridge)

Graph Analytics: cuGraph

Betweenness Centrality
K-Truss Community Detection
Code Refactoring

Data Visualization: cuXilter

Polished documentation
Deck.gl 2D/3D is now default choropleth map
Removed library names (e.g. datashader) from our API to simplify chart creation defaults

Geospatial Analysis: cuSpatial

Adds batch cubic spline interpolation for trajectories and other curves
Docs are now parsed and rendered to HTML for 0.13 and nightly builds

Signal Processing: cuSignal

Conda install support
Faster polyphase resampler
New acoustics module.

RAPIDS Community Updates

Communication: UCX-Py

Focused on bug fixes,
iterative codebase refactoring, and
resolving Multi-Node Multi-GPU InfiniBand tests.

SQL: BlazingSQL

Along with a ton of bug fixes,
Release 0.13 adds ROUND(), CASE with Strings, and for distributed queries AVG() support.
Starts a new feature initiative called “Bigger than GPU,” which effectively allows SQL queries that don’t fit into the available GPU memory to execute.

The Wrap Up

Release 0.13 was major, which means release 0.14 will focus on quality of life improvements. In 0.14 RAPIDS will refine its docs, continue to work with the community on integration, push down its bug count, expand its C++ examples, and add more tests in its CI/CD system. In 0.14 and beyond, we will focus on stability at scale, hardening features, and preparing for 1.0.

As always, we want to thank all of you for using RAPIDS and contributing to our ever-growing GPU-accelerated ecosystem. Please check out our latest release deck or join in on the conversation on Slack or GitHub.

For more details read the full-length release blog on RAPIDS Medium here.

RAPIDS Release 0.13 is Live and Packed with New Features

The Great cuDF++ Refactor Gets Official Lift-off

Summary of RAPIDS Core Library Updates

Dataframes: cuDF

Machine Learning: cuML and XGBoost

Graph Analytics: cuGraph

Data Visualization: cuXilter

Geospatial Analysis: cuSpatial

Signal Processing: cuSignal

RAPIDS Community Updates

Communication: UCX-Py

SQL: BlazingSQL

The Wrap Up

Related resources

Tags

About the Authors

RAPIDS Release 0.13 is Live and Packed with New Features

The Great cuDF++ Refactor Gets Official Lift-off

Summary of RAPIDS Core Library Updates

Dataframes: cuDF

Machine Learning: cuML and XGBoost

Graph Analytics: cuGraph

Data Visualization: cuXilter

Geospatial Analysis: cuSpatial

Signal Processing: cuSignal

RAPIDS Community Updates

Communication: UCX-Py

SQL: BlazingSQL

The Wrap Up

Related resources

Tags

About the Authors

Comments

Related posts

RAPIDS cuDF Accelerates pandas Nearly 150x with Zero Code Changes

RAPIDS Accelerator for Apache Spark v21.06 Release

10 Minutes to Data Science: Transitioning Between RAPIDS cuDF and CuPy Libraries

Building an Accelerated Data Science Ecosystem: RAPIDS Hits Two Years

RAPIDS: The Rise of Notebooks Extended

Related posts

Just Released: NVIDIA Modulus v24.04

New Video Series: OpenUSD for Developers

Generative AI for Digital Humans and New AI-powered NVIDIA RTX Lighting

NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy

Boost Multi-Omics Analysis with GPU-Acceleration and Generative AI