— Fault tolerance in MPI becomes a main issue in the HPC community. Several approaches are envisioned from user or programmer controlled fault tolerance to fully automatic fault ...
Aurelien Bouteiller, Boris Collin, Thomas Hé...
The InfiniBandTM Architecture (IBA) is a new promising I/O communication standard positioned for building clusters and System Area Networks (SANs). However, the IBA specification ...
With the latest high-end computing nodes combining shared-memory multiprocessing with hardware multithreading, new scheduling policies are necessary for workloads consisting of mu...
Robert L. McGregor, Christos D. Antonopoulos, Dimi...
The problem of writing high performance parallel applications becomes even more challenging when irregular, sparse or adaptive methods are employed. In this paper we introduce com...
The need to provide performance guarantee in high performance servers has long been neglected. Providing performance guarantee in current and future servers is difficult because ï...