Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
In this paper we address the problem of reducing the energy consumption in distributed embedded systems associated with time-constraints and equipped with fault-tolerant technique...
Recent developments in the field of object-based fault tolerance and the advent of the first OMG FTCORBA compliant middleware raise new requirements for the design process of dist...
: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...