Fast and accurate fault detection is becoming an essential component of management software for mission critical systems. A good fault detector makes possible to initiate repair a...
Fault tolerance schemes for mobile agents to survive agent server crash failures are complex since developers normally have no control over remote agent servers. Some solutions mo...
An accurate and up-to-date diagnostic model is critical for economic aircraft engine operation. However, for many commercial airline fleets, monitoring and diagnosing engine fault...
LiJie Yu, Daniel J. Cleary, Mark D. Osborn, Vrinda...
In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
RPC is one of the programming models envisioned for the Grid. In Internet connected Large Scale Grids such as Desktop Grids, nodes and networks failures are not rare events. This ...