Sciweavers

CCGRID
2010
IEEE
13 years 6 months ago
Team-Based Message Logging: Preliminary Results
Fault tolerance will be a fundamental imperative in the next decade as machines containing hundreds of thousands of cores will be installed at various locations. In this context, ...
Esteban Meneses, Celso L. Mendes, Laxmikant V. Kal...
CLUSTER
2004
IEEE
13 years 8 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
SRDS
1998
IEEE
13 years 9 months ago
The Cost of Recovery in Message Logging Protocols
Sriram Rao, Lorenzo Alvisi, Harrick M. Vin
EDCC
2005
Springer
13 years 10 months ago
Performance Evaluation of Consistent Recovery Protocols Using MPICH-GF
This paper presents an implementation of several consistent protocols at the abstract device level and their performance comparison. We have performed experiments using three NAS P...
Namyoon Woo, Hyungsoo Jung, Dongin Shin, Hyuck Han...
IPPS
2007
IEEE
13 years 11 months ago
A Fault Tolerance Protocol with Fast Fault Recovery
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Sayantan Chakravorty, Laxmikant V. Kalé