Sciweavers

342 search results - page 2 / 69
» A planning based approach to failure recovery in distributed...
Sort
View
ICAC
2005
IEEE
13 years 11 months ago
Distributed Troubleshooting Agents
Key issues to address in autonomic job recovery for cluster computing are recognizing job failure; understanding the failure sufficiently to know if and how to restart the job; an...
Charles Earl, Emilio Remolina, Jim Ong, John Brown
ICDCS
1997
IEEE
13 years 10 months ago
Distributed Recovery with K-Optimistic Logging
Fault-tolerance techniques based on checkpointing and message logging have been increasingly used in real-world applications to reduce service down-time. Most industrial applicati...
Yi-Min Wang, Om P. Damani, Vijay K. Garg
ISORC
2009
IEEE
14 years 12 days ago
Fault-Tolerance for Component-Based Systems - An Automated Middleware Specialization Approach
General-purpose middleware, by definition, cannot readily support domain-specific semantics without significant manual efforts in specializing the middleware. This paper prese...
Sumant Tambe, Akshay Dabholkar, Aniruddha S. Gokha...
IPPS
2006
IEEE
13 years 11 months ago
Load balancing in the presence of random node failure and recovery
In many distributed computing systems that are prone to either induced or spontaneous node failures, the number of available computing resources is dynamically changing in a rando...
Sagar Dhakal, Majeed M. Hayat, Jorge E. Pezoa, Cha...
ICECCS
1997
IEEE
92views Hardware» more  ICECCS 1997»
13 years 10 months ago
Cache based fault recovery for distributed systems
No cache based techniques for roll-forward fault recovery exist at present. A split-cache approach is proposed that provides e cient support for checkpointing and roll-forward fau...
Avi Mendelson, Neeraj Suri