Sciweavers

JPDC
2007

Self-stabilizing algorithm for checkpointing in a distributed system

13 years 4 months ago
Self-stabilizing algorithm for checkpointing in a distributed system
If the variables used for a checkpointing algorithm have data faults, the existing checkpointing and recovery algorithms may fail. In this paper, self-stabilizing data fault detecting and correcting, checkpointing, and recovery algorithms are proposed in a ring topology. The proposed data fault detection and correction algorithms can handle data faults; at most one per process, but in any number of processes. The proposed checkpointing algorithm can deal with concurrent multiple initiations of checkpointing and data faults. A process can recover from a fault, using the proposed recovery algorithm in spite of multiple data faults present in the system. All the proposed algorithms converge in O(n) steps, where n is the number of processes. The algorithm can be extended to work for general topologies too. © 2007 Elsevier Inc. All rights reserved.
Partha Sarathi Mandal, Krishnendu Mukhopadhyaya
Added 16 Dec 2010
Updated 16 Dec 2010
Type Journal
Year 2007
Where JPDC
Authors Partha Sarathi Mandal, Krishnendu Mukhopadhyaya
Comments (0)