This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU c...
Abstract. This article presents the C++ library vShark which reduces the intranode communication overhead of parallel programs on clusters of SMPs. The library is built on top of m...
Abstract. Distributing process-oriented programs across a cluster of machines requires careful attention to the effects of network latency. The MPI standard, widely used for cluste...
The ROMIO implementation of the MPI-IO standard provides a portable infrastructure for use on top of any number of different underlying storage targets. These different targets va...
Robert B. Ross, Robert Latham, William Gropp, Raje...
—Achieving high-performance message passing on top of generic ETHERNET hardware suffers from the NIC interruptdriven model where coalescing is usually involved. We present an in-...