Tuned math libraries are an easy and dependable way to extract the ultimate performance from your HPC system. However, for long-lived applications or those that need to run on a variety of platforms, adapting library calls for each vendor or library version can be a maintenance nightmare.
A compiler that can automatically generate calls to tuned math libraries gives you the best of both worlds: easy portability and ultimate performance. In this post, I show how you can seamlessly accelerate many standard Fortran array intrinsics and language constructs on GPUs. The nvfortran compiler enables this acceleration automatically by mapping Fortran statements to the functions available in the NVIDIA cuTENSOR library, a first-of-its-kind, GPU-accelerated, tensor linear algebra library providing tensor contraction, reduction, and element-wise operations.