We study the problem of sorting on a parallel computer with limited communication bandwidth. By using the PRAM(m) model, where p processors communicate through a globally shared me...
In LAPACK many matrix operations are cast as block algorithms which iteratively process a panel using an unblocked algorithm and then update a remainder matrix using the high perf...
Large graph analysis has become increasingly important and is widely used in many applications such as web mining, social network analysis, biology, and information retrieval. The...
This paper describes a proposal for a set of Parallel Basic Linear Algebra Subprograms PBLAS. The PBLAS are targeted at distributed vector-vector, matrix-vector and matrixmatrix...
Jaeyoung Choi, Jack Dongarra, Susan Ostrouchov, An...
Conventional autotuning configuration of parameters in distributed computing systems using evolutionary strategies increases integrated performance notably, though at the expense ...