Using CUDA Warp-Level Primitives

NVIDIA GPUs execute groups of threads known as warps in SIMT (Single Instruction, Multiple Thread) fashion. Many CUDA programs achieve high performance by taking advantage of warp execution … Read more

Hybridizer: High-Performance C# on GPUs

Hybridizer is a compiler from Altimesh that lets you program GPUs and other accelerators from C# code or .NET Assembly. Using decorated symbols to express parallelism, Hybridizer generates source code or binaries optimized for multicore CPUs and GPUs. In this blog post we illustrate the CUDA target … Read more

How Built a GPU-Powered Fulfillment Engine with F# and CUDA

Have you ever looked at your shopping list and tried to optimize your trip based on things like distance to store, price, and number of items you can buy at each store? The quest for a smarter shopping cart is never-ending, and the complexity of finding even a sub-optimal solution to this problem can quickly … Read more

New PGI Community Edition Now Available

PGI Compilers & Tools are used by scientists and engineers developing applications for high-performance computing (HPC). PGI products deliver world-class multicore CPU performance, an easy on-ramp to GPU computing with OpenACC directives, and performance portability across all major HPC platforms. Version 17.10 is available now for users with current PGI Professional support … Read more

Google Cloud Lowers the Price of NVIDIA Tesla GPUs

Google announced they are cutting the price of NVIDIA Tesla GPUs in the cloud by up to 36 percent. In US regions, each K80 GPU attached to a Google Compute Engine virtual machine is priced at $0.45 per hour while each P100 costs $1.46 per hour.  … Read more

Maximizing Unified Memory Performance in CUDA

Many of today’s applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the most of GPU performance requires the data to be as close to the GPU as possib … Read more