Data Science

An Interactive 2010 Census Plotly-dash Visualization Accelerated By RAPIDS

The COVID-19 pandemic brings the efforts of the data science community to the forefront. Real-time, interactive visualizations of the novel coronavirus’ spread across populations help researchers, scientists, health officials and governments understand, validate, and communicate important insights hidden among hundreds of millions of rows of records.

NVIDIA and Plot.ly, a premier member of NVIDIA Inception, have released a landmark COVID-19 visualization dashboard built on Plot.ly Dash and accelerated by RAPIDS cuDF running on GPUs. Using modified 2010 US Census data and live COVID case reporting, users can view coronavirus cases in high-resolution by panning and zooming from US national aggregates to individuals on a neighborhood block.

Today, basic line charts are predominantly used to chart the spread of COVID. These graphs have limited capabilities – tracking on only two dimensions – quantity over time, and at large county to nationwide scale. The COVID dashboard on Plot.ly Dash allows users to derive deeper insights into the spread of the virus with each individual mapped to a single point on their own residential block. Interactivity with data in high-fidelity allows data scientists to discover patterns at every scale and draw novel understandings of the pandemic’s spread.

Simplifying a Data Scientist’s Workflow

Census population datasets are spread out in time and space over multiple dimensions. To create performant applications on top of such a complex dataset would typically require several weeks, teams and tiers of architecture. Running directly from a web browser, the Plot.ly dashboard integrates Plot.ly Dash, RAPIDS cuDF, and Datashader for real-time cross filtering on GPUs. This empowers data scientists to extract valuable insights by working independently across the whole stack — from raw data to user interface — and quickly deliver interactive dashboards without the need for extensive developer and IT teams. Computed live using GPUS, there is no need to pre-aggregate the population or use data subsets.

Figure 1: RAPIDS GPU Data Science Platform Overview

The Importance of GPUs in Data Science

Visualization query load times are extremely important to a data scientist. Split-second load times ensure data scientists are spending more time exploring data than waiting for queries to process. Long load times cause users to become selective of which areas of the visualization to explore, leading to pre-validated steps and conclusions guided by biased assumptions. Traditional data science workflows using CPUs are typically slow and cumbersome. GPU acceleration brings superior performance and higher ROI with reduced infrastructure costs. 

For comparison using the COVID Plot.ly Dash, a query of Florida COVID cases takes 47 seconds to load on Pandas CPU mode. The same query takes only 0.3 seconds on a single NVIDIA Titan RTX+ GPU with RAPIDS cuDF library. 

Figure 2: Querying COVID-19 cases within the state of Florida

Data Science for the Masses

In May 2020, NVIDIA and Plot.ly announced a partnership to bring easily attainable GPU-accelerated AI and ML to a wider business audience. With Plot.ly’s custom GUI (graphical user interface) and intuitive data visualization, non-experts can play with big datasets without a data scientist to run custom queries. 

More Information

Learn more about the COVID dashboard as well as NVIDIA data scientists’ data and design processes best practices in the dedicated on-demand webinar, part of the Compute4COVID series.

Explore how AI, accelerated computing, and technology are contributing to the worldwide battle against the novel coronavirus in the COVID-19 Research Hub.

Developer Resources

Plot.ly Dash is an open-source framework for building interactive web browser-based dashboards using Python. RAPIDS is a suite of open-source GPU data science software libraries built on NVIDIA CUDA-X AI. Download the RAPIDS Plot.ly Dash census demo GitHub repository.

Discuss (0)

Tags