Sciweavers

37 search results - page 3 / 8
» The Cost of Recovery in Message Logging Protocols
Sort
View
ICPP
1999
IEEE
13 years 10 months ago
Coherence-Centric Logging and Recovery for Home-Based Software Distributed Shared Memory
The probability of failures in software distributed shared memory (SDSM) increases as the system size grows. This paper introduces a new, efficient message logging technique, call...
Angkul Kongmunvattana, Nian-Feng Tzeng
IPPS
2007
IEEE
14 years 14 days ago
A Fault Tolerance Protocol with Fast Fault Recovery
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Sayantan Chakravorty, Laxmikant V. Kalé
CLUSTER
2004
IEEE
13 years 10 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
ICDCS
1997
IEEE
13 years 10 months ago
Distributed Recovery with K-Optimistic Logging
Fault-tolerance techniques based on checkpointing and message logging have been increasingly used in real-world applications to reduce service down-time. Most industrial applicati...
Yi-Min Wang, Om P. Damani, Vijay K. Garg
SRDS
1996
IEEE
13 years 10 months ago
Minimizing Timestamp Size for Completely Asynchronous Optimistic Recovery with Minimal Rollback
Basing rollback recovery on optimistic message logging and replay avoids the need for synchronization between processes during failure-free execution. Some previous research has a...
Sean W. Smith, David B. Johnson