Abstract. This article presents the C++ library vShark which reduces the intranode communication overhead of parallel programs on clusters of SMPs. The library is built on top of m...
We consider the realization of matrix-matrix multiplication and propose a hierarchical algorithm implemented in a task-parallel way using multiprocessor tasks on distributed memory...
Thread migration is established as a mechanism for achieving dynamic load sharing. However, fine-grained migration has not been used due to the high thread and messaging overheads...
This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compa...
Daniel J. Scales, Kourosh Gharachorloo, Chandramoh...