Nsight Compute is an interactive kernel profiler for visualizing, debugging, and enhancing the performance of CUDA applications. It provides the collection of a wide variety of detailed metrics, enabling visual insights to improve code efficiency and performance.
This new blog “Accelerating HPC Applications with Nsight Compute Roofline Analysis” focuses on understanding different hardware limitations, and thus the impact on a developer’s ability to tune their code for performance. By comprehending the theoretical compute and memory bandwidth limits of a particular system for a specific code, developers can more strategically and effectively target the parts of their workload that will provide the best return on performance improvement efforts.
Charlene Yang and Samuel Williams are application performance specialists working on cutting edge workloads at the National Energy Research Scientific Computing Center (NERSC), at Lawrence Berkeley National Laboratory (LBNL). In this blog, with the help of NVIDIA’s Max Katz and Jackson Marusarz, the authors demonstrate how Roofline Analysis helps you recognize your kernel’s execution relative to the peak achievable system limits. By mapping your application’s arithmetic intensity vs FLOPS/s, one can more effectively performance-tune HPC codes running on NVIDIA GPUs.
Nsight Compute is vital component of the suite of Nsight tools. A developer can start with Nsight Systems to see the big picture and avoid picking less efficient optimizations based on assumptions and false-positive indicators. Use Nsight Compute to drill in on kernel operation, including the Roofline Analysis discussed in this blog. Learn more about Nsight Compute, and read the new Roofline Blog.
Contact us at our forums and visit our Download Center to access the latest release of the Nsight tools. The Nsight tools are available freely to registered NVIDIA Developer Program members, and/or part of the CUDA Toolkit.