Self-stabilizing algorithm for checkpointing in a distributed system

15 years 5 months ago

Download www.isical.ac.in

If the variables used for a checkpointing algorithm have data faults, the existing checkpointing and recovery algorithms may fail. In this paper, self-stabilizing data fault detecting and correcting, checkpointing, and recovery algorithms are proposed in a ring topology. The proposed data fault detection and correction algorithms can handle data faults; at most one per process, but in any number of processes. The proposed checkpointing algorithm can deal with concurrent multiple initiations of checkpointing and data faults. A process can recover from a fault, using the proposed recovery algorithm in spite of multiple data faults present in the system. All the proposed algorithms converge in O(n) steps, where n is the number of processes. The algorithm can be extended to work for general topologies too. © 2007 Elsevier Inc. All rights reserved.

Partha Sarathi Mandal, Krishnendu Mukhopadhyaya

Real-time Traffic