Sciweavers

JFP
2010

Lightweight checkpointing for concurrent ML

13 years 2 months ago
Lightweight checkpointing for concurrent ML
Transient faults that arise in large-scale software systems can often be repaired by re-executing the code in which they occur. Ascribing a meaningful semantics for safe re-execution in multithreaded code is not obvious, however. For a thread to re-execute correctly a region of code, it must ensure that all other threads that have witnessed its unwanted effects within that region are also reverted to a meaningful earlier state. If not done properly, data inconsistencies and other undesirable behavior may result. However, automatically determining what constitutes a consistent global checkpoint is not straightforward since thread interactions are a dynamic property of the program. In this paper, we present a safe and efficient checkpointing mechanism for Concurrent ML (CML) be used to recover from transient faults. We introduce a new linguistic abstraction called stabilizers that permits the specification of per-thread monitors and the restoration of globally consistent checkpoints. ...
Lukasz Ziarek, Suresh Jagannathan
Added 28 Jan 2011
Updated 28 Jan 2011
Type Journal
Year 2010
Where JFP
Authors Lukasz Ziarek, Suresh Jagannathan
Comments (0)