Sciweavers

IPPS
1997
IEEE

A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers

13 years 8 months ago
A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers
We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA1 (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor even when the block size is very small as well as very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.
J. Choi
Added 26 Aug 2010
Updated 26 Aug 2010
Type Conference
Year 1997
Where IPPS
Authors J. Choi
Comments (0)