—Achieving high-performance message passing on top of generic ETHERNET hardware suffers from the NIC interruptdriven model where coalescing is usually involved. We present an in-...
With the availability of inexpensive blade servers featuring 32 GB or more of main memory, memory-based engines such as the SAP NetWeaver Business Warehouse Accelerator are coming...
Olga Mordvinova, Oleksandr Shepil, Thomas Ludwig 0...
The emergence of multicore processors raises the need to efficiently transfer large amounts of data between local processes. MPICH2 is a highly portable MPI implementation whose l...
Darius Buntinas, Brice Goglin, David Goodell, Guil...
Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as ...
Anthony Danalis, Lori L. Pollock, D. Martin Swany,...
Abstract—Multi-core systems are now extremely common in modern clusters. In the past commodity systems may have had up to two or four CPUs per compute node. In modern clusters, t...