Evaluating the Performance of OpenACC in GCC

A new blog details the history of the OpenACC GCC implementation, its availability, and enhancements to OpenACC support in GCC. You will also learn about a recent project to assess and improve the performance of codes compiled with GCC’s OpenACC support.

The Role of OpenACC

A scalar optimizing compiler has a really good day when it gets an optimization that boosts performance by 5%. Scalar architectures have (relatively) limited opportunities for optimization and scalar optimizing compilers have decades of theory and implementation effort supporting them. The landscape is different for compilers targeting parallel and vector hardware (often called “restructuring compilers”) such as that possessed by GPUs. Effectively used, parallel and vector hardware provide speedups that easily dwarf scalar optimizers’ best days. “Effectively used” is the key phrase. Parallel hardware deployed effectively provides speedup measured on logarithmic scales. As frustrated programmers are well aware, parallel hardware ineffectively deployed provides negative speedups (i.e. slowdowns) – particularly frustrating on a system where speedups of 10-100x are expected.

The OpenACC API defines a collection of directives and routines developed to help soothe frustrated programmers. Using OpenACC directives, a programmer helps a compiler uncover and schedule parallelism particularly on GPUs which possess a rich variety of parallel opportunities. PGI has been an early leader of OpenACC development, and the PGI compiler is the most mature implementation of OpenACC. GCC, on the other hand, is a relative newcomer to OpenACC.

Read more >