With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across ...
We show how the traditional protocol stack, such as TCP/IP, can be eliminated for socket based high speed communication within a cluster. The SCI shared memory interconnect is used...
This paper discusses some of the issues involved in implementing a shared-address space programming model on large-scale, distributed-memory multiprocessors. While such a programm...
David A. Kranz, Kirk L. Johnson, Anant Agarwal, Jo...
Synchronization operations, such as fence and locking, are used in many parallel operations accessing shared memory. However, a process which is blocked waiting for a fence operat...
Darius Buntinas, Amina Saify, Dhabaleswar K. Panda...
A key challenge in achieving high performance on software DSM systems is overcoming their relatively large communication latencies. In this paper, we consider two techniques which...