NVSwitch: Leveraging NVLink to Maximum Effect

GPUs have been PCIe devices for many generations in client systems, and more recently in servers. The rapid growth in deep learning workloads has driven the need for a faster and more scalable interconnect, as PCIe bandwidth increasingly becomes the bottleneck at the multi-GPU system lev … Read more

Using CUDA Warp-Level Primitives

NVIDIA GPUs execute groups of threads known as warps in SIMT (Single Instruction, Multiple Thread) fashion. Many CUDA programs achieve high performance by taking advantage of warp execution … Read more

Hybridizer: High-Performance C# on GPUs

Hybridizer is a compiler from Altimesh that lets you program GPUs and other accelerators from C# code or .NET Assembly. Using decorated symbols to express parallelism, Hybridizer generates source code or binaries optimized for multicore CPUs and GPUs. In this blog post we illustrate the CUDA target … Read more