RAPIDS cuGraph: multi-GPU PageRank

RAPIDS cuGraph is on a mission to provide multi-GPU graph analytics to allow users to scale to billion and even trillion scale graphs. The first step along that path is the new release of a single-node multi-GPU version of PageRank.

Experimental results show that an end-to-end pipeline involving the new multi-GPU PageRank is on average 80x faster than Apache Spark when comparing one NVIDIA DGX-2 vs 100 Spark nodes on a 300GB dataset. At the CUDA level, a graph of this size is traversed at a speed of 38 billion edges per second on a single node.

PageRank measures the relative importance of elements in a graph by creating a score based on the propagation of influence between nodes. The underlying assumption is that important nodes are linked to from other important nodes. A node that is linked to by any other with high PageRank receives a high rank itself.

Rapids software stack

One interesting aspect of PageRank is its ability to account for the probability of not following links. For example, on the graph that represents the web, it is estimated that a user has an 85% chance to click on a link on the current webpage (continue surfing) and a 15% to simply open a new window and jump to a random web page. In the algorithm, this is the alpha parameter, which is also referred to as the teleport or damping factor parameter.

To illustrate how to use the new multi-GPU PageRank analytics, we have released two new notebooks. The first notebook is intended to get you started in a few minutes on a real-world example. In this notebook, we analyze a Twitter dataset (26GB on disk) with 41.7 million users with 1.47 billion social relations (edges) to find out the most influential profiles.

The example uses two NVIDIA V100 32GB GPUs, but we encourage users to experiment with different configurations, as long as the sum of GPU memory is 64GB or more. The notebook walks through reading the CSV file using RAPIDS cuDF, then computing the PageRank score, and then sorting the PageRank scores using cuDF. Finally, we use a map to convert Vertex IDs into Twitter’s numeric IDs and retrieve the Twitter username.

The second notebook requires a DGX-2 or comparable server with 16 fully connected 32GB V100s. The reason is that the notebook explores processing a 300GB (on disk) HiBench dataset (Zipfian distribution). A good rule of thumb is to have twice the amount of GPU memory as the file size, so the CSV parser has room for the raw data and extracted columns.

This notebook shows how to load a large input split across multiple files (32 partitions) and across multiple GPUs so that the overhead per file is small enough to process 300GB of raw data in only 512GB of GPU memory. Once the data is loaded, then the Multi-GPU PageRank analytic is called as in the first notebook.

Based on our experimental results, the novel multi-GPU PageRank feature can analyze over 16 billion links to rank half a billion nodes in just a few seconds on a DGX2. See see these results and further details on the workflow and benchmarks read the RAPIDS.ai blog here