Sciweavers

12 search results - page 2 / 3
» Mechanizing the expert dense linear algebra developer
Sort
View
PARA
1995
Springer
13 years 9 months ago
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
This paper describes a proposal for a set of Parallel Basic Linear Algebra Subprograms PBLAS. The PBLAS are targeted at distributed vector-vector, matrix-vector and matrixmatrix...
Jaeyoung Choi, Jack Dongarra, Susan Ostrouchov, An...
ASPLOS
2009
ACM
14 years 6 months ago
QR decomposition on GPUs
QR decomposition is a computationally intensive linear algebra operation that factors a matrix A into the product of a unitary matrix Q and upper triangular matrix R. Adaptive sys...
Andrew Kerr, Dan Campbell, Mark Richards
ICCS
2009
Springer
14 years 9 days ago
A Note on Auto-tuning GEMM for GPUs
The development of high performance dense linear algebra (DLA) critically depends on highly optimized BLAS, and especially on the matrix multiplication routine (GEMM). This is espe...
Yinan Li, Jack Dongarra, Stanimire Tomov
IPPS
2002
IEEE
13 years 10 months ago
Optimizing Graph Algorithms for Improved Cache Performance
Tiling has long been used to improve cache performance. Recursion has recently been used as a cache-oblivious method of improving cache performance. Both of these techniques are n...
Joon-Sang Park, Michael Penner, Viktor K. Prasanna
ICS
1999
Tsinghua U.
13 years 10 months ago
An experimental evaluation of tiling and shackling for memory hierarchy management
On modern computers, the performance of programs is often limited by memory latency rather than by processor cycle time. To reduce the impact of memory latency, the restructuring ...
Induprakas Kodukula, Keshav Pingali, Robert Cox, D...