Sciweavers

ICDCS
2003
IEEE

Software Fault Tolerance of Distributed Programs Using Computation Slicing

13 years 9 months ago
Software Fault Tolerance of Distributed Programs Using Computation Slicing
Writing correct distributed programs is hard. In spite of extensive testing and debugging, software faults persist even in commercial grade software. Many distributed systems, especially those employed in safety-critical environments, should be able to operate properly even in the presence of software faults. Monitoring the execution of a distributed system, and, on detecting a fault, initiating the appropriate corrective action is an important way to tolerate such faults. This gives rise to the predicate detection problem which involves finding a consistent cut of a distributed computation, if it exists, that satisfies the given global predicate. Detecting a predicate in a computation is, however, an NP-complete problem. To ameliorate the associated combinatorial explosion problem, we introduce the notion of computation slice in our earlier papers [5, 10]. Intuitively, slice is a concise representation of those consistent cuts that satisfy a certain condition. To detect a predicate...
Neeraj Mittal, Vijay K. Garg
Added 04 Jul 2010
Updated 04 Jul 2010
Type Conference
Year 2003
Where ICDCS
Authors Neeraj Mittal, Vijay K. Garg
Comments (0)