The HR algorithm plays a significant role in solving nonsymmetric eigenvalue problems. By porting it to the GPU, Tomov and Dongarra, from the University of Tennessee and Oakridge National labs, report a 16X performance improvement over the latest LAPACK 3.1 algorithm running just on current multicores (in double precision arithmetic). In addition, the paper shows a way of accelerating a large and important class of DLA algorithms, namely the two-sided factorizations.
See http://www.nvidia.com/object/cuda_home.html#state=detailsOpen;aid=f3c0c426-1df9-4fb3-b9f4-1bd7f62a6978 on CUDA Zone.
Posted on 07/30/2009