—We present an application programming interface (API) for the C programming language that facilitates the development of dense linear algebra algorithms on graphics processors a...
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit ...
Marc Baboulin, Alfredo Buttari, Jack Dongarra, Jak...
QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...
On modern computers, the performance of programs is often limited by memory latency rather than by processor cycle time. To reduce the impact of memory latency, the restructuring ...
Induprakas Kodukula, Keshav Pingali, Robert Cox, D...