Modern out-of-order processors tolerate long latency memory operations by supporting a large number of inflight instructions. This is particularly useful in numerical applications...
This paper describes a proposal for a set of Parallel Basic Linear Algebra Subprograms PBLAS. The PBLAS are targeted at distributed vector-vector, matrix-vector and matrixmatrix...
Jaeyoung Choi, Jack Dongarra, Susan Ostrouchov, An...
High-level languages are growing in popularity. However, decades of C software development have produced large libraries of fast, timetested, meritorious code that are impractical...
Tristan Ravitch, Steve Jackson, Eric Aderhold, Ben...