This paper explains why parallel implementation of matrix multiplication--a seemingly simple algorithm that can be expressed as one statement and three nested loops--is complex: P...
John A. Gunnels, Calvin Lin, Greg Morrow, Robert A...
We present a new fast and scalable matrix multiplication algorithm, called DIMMA Distribution-Independent Matrix Multiplication Algorithm, for block cyclic data distribution on ...
We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and...
We present a simple implementation of a token-based distributed mutual exclusion algorithm for multithreaded systems. Several per-node requests could be issued by threads running ...
In this paper, an adaptive matrix multiplication algorithm for dynamic heterogeneous environments is developed and evaluated. Unlike the state-of-the-art approaches, where load ba...