Maximizing Unified Memory Performance in CUDA

Many of today’s applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the most of GPU performance requires the data to be as close to the GPU as possible. This is especially important for applications that iterate over the same data multiple times or have a high flops/byte ratio. Many real-world codes have to selectively use data on the GPU due to its limited memory capacity, and it is the programmer’s responsibility to move only necessary parts of the working set to GPU memory.

Traditionally, developers have used explicit memory copies to transfer data. While this usually gives the best performance, it requires very careful management of GPU resources and predictable access patterns. Zero-copy access provides fine-grained direct access to the entire system memory, but the speed is limited by the interconnect (PCIe or NVLink) and it’s not possible to take advantage of data locality.
Unified Memory combines the advantages of explicit copies and zero-copy access: the GPU can access any page of the entire system memory and at the same time migrate the data on-demand to its own memory for high bandwidth access. To get the best Unified Memory performance it’s important to understand how on-demand page migration works.
In a new NVIDIA Developer Blog post, Nikolay Sakharnykh, a Senior Developer Technology Engineer at NVIDIA, breaks down Unified Memory page migration step by step and shows you what you can do to optimize your code to get the most out of Unified Memory.
Read more >

Maximizing Unified Memory Performance in CUDA

Related resources

Tags

About the Authors

Maximizing Unified Memory Performance in CUDA

Related resources

Tags

About the Authors

Comments

Related posts

Improving GPU Memory Oversubscription Performance

Maximizing Unified Memory Performance in CUDA

Unified Memory for CUDA Beginners

Beyond GPU Memory Limits with Unified Memory on Pascal

Unified Memory in CUDA 6

Related posts

New Video Series: OpenUSD for Developers

Generative AI for Digital Humans and New AI-powered NVIDIA RTX Lighting

NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy

Breaking Barriers in Healthcare with New Models for Generative AI and Cellular Imaging

Boost Multi-Omics Analysis with GPU-Acceleration and Generative AI