By Joshua Patterson
As I look back on release 0.13, I feel a fraction of how the astronauts on Apollo 13 must have felt: relieved to have landed and grateful for rooms of quick thinking engineers. Early space missions can teach you a lot about what to do when things don’t go according to plan. You have to keep a level head, follow a checklist and efficiently improvise solutions when plans A, B, and C didn’t work.
When we started building open source tools for GPU-accelerated data science, we had a vision of making the most advanced computing resources available to everyone. To quote a speech JFK made about the race to the moon, “We do these things not because they are easy, but because they are hard.” Let me tell you about some of the hard things the team did for this release to move our vision forward.
The Great cuDF++ Refactor Gets Official Lift-off
I mentioned in the 0.12 release blog that we kicked off “The Great cuDF++ Refactor”. We have made huge progress on this work for release 0.13. Nearly all the code base has been ported to use the libcudf++ API. That means we are on a firm foundation for multi-language, multi-platform RAPIDS for the foreseeable future. Such a huge refactor can lead to bugs, even when carefully undertaken by the best engineers. So if you notice any problems, please file GitHub issues so we can resolve these quickly.
Summary of RAPIDS Core Library Updates
Given everyone has more to think about these days given the current state of the world, we’re going to split our release blog into two parts. Many of you went from one to three plus jobs as you’re spending more time at home (and hopefully practicing social distancing), and reading a long release blog may not be top of mind. You can read the long version of all the new RAPIDS improvements on the RAPIDS Medium blog, or skim the summary below by library.
- Python is now hooked up to the “Great libcudf++ Port”
- Expanded groupby aggregations including: `median`, `nunique`, `nth` and `std`
- Additional join methods: semi-joins and anti-joins
- `concatenate` optimizations giving up to 2000x speedups
- Distributed multi column sorting and multi column hash partitioning
Machine Learning: cuML and XGBoost
- XGBoost 1.0 launch – including new Dask API
- New configurable input and output types for estimators
- Multi-node, multi-GPU support for linear models (PCA, tSVD, OLS, Ridge)
Graph Analytics: cuGraph
- Betweenness Centrality
- K-Truss Community Detection
- Code Refactoring
Data Visualization: cuXilter
- Polished documentation
- Deck.gl 2D/3D is now default choropleth map
- Removed library names (e.g. datashader) from our API to simplify chart creation defaults
Geospatial Analysis: cuSpatial
- Adds batch cubic spline interpolation for trajectories and other curves
- Docs are now parsed and rendered to HTML for 0.13 and nightly builds
Signal Processing: cuSignal
- Conda install support
- Faster polyphase resampler
- New acoustics module.
RAPIDS Community Updates
- Focused on bug fixes,
- iterative codebase refactoring, and
- resolving Multi-Node Multi-GPU InfiniBand tests.
- Along with a ton of bug fixes,
- Release 0.13 adds ROUND(), CASE with Strings, and for distributed queries AVG() support.
- Starts a new feature initiative called “Bigger than GPU,” which effectively allows SQL queries that don’t fit into the available GPU memory to execute.
The Wrap Up
Release 0.13 was major, which means release 0.14 will focus on quality of life improvements. In 0.14 RAPIDS will refine its docs, continue to work with the community on integration, push down its bug count, expand its C++ examples, and add more tests in its CI/CD system. In 0.14 and beyond, we will focus on stability at scale, hardening features, and preparing for 1.0.
As always, we want to thank all of you for using RAPIDS and contributing to our ever-growing GPU-accelerated ecosystem. Please check out our latest release deck or join in on the conversation on Slack or GitHub.
For more details read the full-length release blog on RAPIDS Medium here.