A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers

15 years 9 months ago

We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA1 (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectively, and exploits the LCM block concept to obtain the maximum performance of the sequential BLAS routine in each processor even when the block size is very small as well as very large. The algorithm is implemented and compared with SUMMA on the Intel Paragon computer.

J. Choi

Real-time Traffic

Distributed And Parallel Computing | Distribution-Independent Matrix Multiplication | IPPS 1997 | Matrix Multiplication Algorithm | Scalable Matrix Multiplication |

claim paper

Added	26 Aug 2010
Updated	26 Aug 2010
Type	Conference
Year	1997
Where	IPPS
Authors	J. Choi

Sciweavers

A Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers

Distributed And Parallel Computing | Distribution-Independent Matrix Multiplication | IPPS 1997 | Matrix Multiplication Algorithm | Scalable Matrix Multiplication |

Explore & Download

Productivity Tools

Sciweavers