Sciweavers

PODC
1994
ACM

A Checkpoint Protocol for an Entry Consistent Shared Memory System

13 years 8 months ago
A Checkpoint Protocol for an Entry Consistent Shared Memory System
Workstation clusters are becoming an interesting alternative to dedicated multiprocessors. In this environment, the probability of a failure, during an application's execution, increases with the execution time and the number of workstations used. If no provision is made for handling failures, it is unlikely that long running applications will terminate successfully. One solution to this problem is process checkpointing. This paper presents a checkpoint protocol for a multithreaded distributed shared memory system based on the entry consistency memory model. The protocol allows transparent recovery from single node failures and, in some cases, from multiple node failures. A simple mechanism is used to determine if the system can be brought to a consistent state in the event of multiple machine crashes. The protocol keeps a distributed log of shared data accesses in the volatile memory of the processes, taking advantage of the independent failure characteristics of workstation clu...
Nuno Neves, Miguel Castro, Paulo Guedes
Added 10 Aug 2010
Updated 10 Aug 2010
Type Conference
Year 1994
Where PODC
Authors Nuno Neves, Miguel Castro, Paulo Guedes
Comments (0)