Many parallel applications from scientific computing use MPI collective communication operations to collect or distribute data. Since the execution times of these communication op...
Today's complex applications must face the distribution of data and code among different network nodes. Computation in distributed contexts is demanding increasingly powerful...
Abstract—Execution of applications on upcoming highperformance computing (HPC) systems introduces a variety of new challenges and amplifies many existing ones. These systems will...
Avneesh Pant, Hassan Jafri, Volodymyr V. Kindraten...
Self-healing systems focus on how to reducing the complexity and cost of the management of dependability policies and mechanisms without human intervention. This position paper pr...
Group communication protocols constitute a basic building block for highly dependable distributed applications. Designing and correctly implementing a group communication system (...
Claudio Basile, Long Wang, Zbigniew Kalbarczyk, Ra...