—Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to im...
Ramya Prabhakar, Sudharshan S. Vazhkudai, Youngjae...
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
The Message Passing Interface (MPI) is a standard in parallel computing, and can also be used as a highperformance programming model for Grid application development. How to execu...
Parallel file systems are widely used in clusters to provide high performance I/O. However, most of the existing parallel file systems are based on UNIX-like operating systems. W...
This paper presents an experimental evaluation of the fault-tolerant communication (FTCOM) layer of the DECOS integrated architecture. The FTCOM layer implements different agreemen...
Jonny Vinter, Henrik Eriksson, Astrit Ademaj, Bern...