Sciweavers

342 search results - page 5 / 69
» A planning based approach to failure recovery in distributed...
Sort
View
TPDS
1998
135views more  TPDS 1998»
14 years 9 months ago
On Coordinated Checkpointing in Distributed Systems
—Coordinated checkpointing simplifies failure recovery and eliminates domino effects in case of failures by preserving a consistent global checkpoint on stable storage. However, ...
Guohong Cao, Mukesh Singhal
HPDC
2000
IEEE
15 years 2 months ago
Failure-Atomic File Access in an Interposed Network Storage System
This paper presents a recovery protocol for block I/O operations in Slice, a storage system architecture for highspeed LANs incorporating network-attached block storage. The goal ...
Darrell C. Anderson, Jeffrey S. Chase
WOSS
2004
ACM
15 years 3 months ago
Combining statistical monitoring and predictable recovery for self-management
Complex distributed Internet services form the basis not only of e-commerce but increasingly of mission-critical networkbased applications. What is new is that the workload and in...
Armando Fox, Emre Kiciman, David A. Patterson
FTCS
1993
97views more  FTCS 1993»
14 years 11 months ago
Virtually-Synchronous Communication Based on a Weak Failure Suspector
Failure detectors (or, more accurately Failure Suspectors { FS) appear to be a fundamental service upon which to build fault-tolerant, distributed applications. This paper shows t...
André Schiper, Aleta Ricciardi
IJCAI
1989
14 years 10 months ago
Using and Refining Simplifications: Explanation-Based Learning of Plans in Intractable Domains
This paper describes an explanation-based approach lo learning plans despite a computationally intractable domain theory. In this approach, the system learns an initial plan using...
Steve A. Chien