Sophisticated parallel matrix multiplication algorithms like PDGEMM exhibit a complex structure and can be controlled by a large set of parameters including blocking factors and bl...
Achieving peak performance in important numerical kernels such as dense matrix multiply or sparse-matrix vector multiplication usually requires extensive, machine-dependent tuning ...
During the last half-decade, a number of research efforts have centered around developing software for generating automatically tuned matrix multiplication kernels. These include ...
John A. Gunnels, Fred G. Gustavson, Greg Henry, Ro...
Achieving peak performance from library subroutines usually requires extensive, machine-dependent tuning by hand. Automatic tuning systems have emerged in response, and they typic...