We consider two-dimensional partitioning of general sparse matrices for parallel sparse matrix-vector multiply operation. We present three hypergraph-partitioning-based methods, ea...
We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and...
Cache-based, general purpose CPUs perform at a small fraction of their maximum floating point performance when executing memory-intensive simulations, such as those required for sp...
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity ON , where 2 3. We show that such an algorithm can be parallelize...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA Distribution-Independent Matrix Multiplication Algorithm, for block cyclic data distribution on ...