Sciweavers

IPPS
2005
IEEE

Impact of Event Logger on Causal Message Logging Protocols for Fault Tolerant MPI

13 years 10 months ago
Impact of Event Logger on Causal Message Logging Protocols for Fault Tolerant MPI
— Fault tolerance in MPI becomes a main issue in the HPC community. Several approaches are envisioned from user or programmer controlled fault tolerance to fully automatic fault detection and handling. For this last approach, several protocols have been proposed in the literature. In a recent paper, we have demonstrated that uncoordinated checkpointing tolerates higher fault frequency than coordinated checkpointing. Moreover causal message logging protocols have been proved the most efficient message logging technique. These protocols consist in piggybacking non deterministic events to computation message. Several protocols have been proposed in the literature. Their merits are usually evaluated from four metrics: a) piggybacking computation cost, b) piggyback size, c) applications performance and d) fault recovery performance. In this paper, we investigate the benefit of using a stable storage for logging message events in causal message logging protocols. To evaluate the advantag...
Aurelien Bouteiller, Boris Collin, Thomas Hé
Added 25 Jun 2010
Updated 25 Jun 2010
Type Conference
Year 2005
Where IPPS
Authors Aurelien Bouteiller, Boris Collin, Thomas Hérault, Pierre Lemarinier, Franck Cappello
Comments (0)