Sciweavers

67 search results - page 4 / 14
» Recovery Protocol for Mobile Checkpointing
Sort
View
PODC
1994
ACM
15 years 1 months ago
A Checkpoint Protocol for an Entry Consistent Shared Memory System
Workstation clusters are becoming an interesting alternative to dedicated multiprocessors. In this environment, the probability of a failure, during an application's executio...
Nuno Neves, Miguel Castro, Paulo Guedes
EDCC
2005
Springer
15 years 3 months ago
Performance Evaluation of Consistent Recovery Protocols Using MPICH-GF
This paper presents an implementation of several consistent protocols at the abstract device level and their performance comparison. We have performed experiments using three NAS P...
Namyoon Woo, Hyungsoo Jung, Dongin Shin, Hyuck Han...
69
Voted
ICDCS
2000
IEEE
15 years 2 months ago
On Low-Cost Error Containment and Recovery Methods for Guarded Software Upgrading
To assure dependable onboard evolution, we have developed a methodology called guarded software upgrading (GSU). In this paper, we focus on a low-cost approach to error containmen...
Ann T. Tai, Kam S. Tso, Leon Alkalai, Savio N. Cha...
86
Voted
CLUSTER
2004
IEEE
15 years 1 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...