Sciweavers

89 search results - page 3 / 18
» The overhead of consensus failure recovery
Sort
View
CCGRID
2010
IEEE
13 years 7 months ago
Selective Recovery from Failures in a Task Parallel Programming Model
Abstract--We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tr...
James Dinan, Arjun Singri, P. Sadayappan, Sriram K...
AOSD
2007
ACM
13 years 9 months ago
Declarative failure recovery for sensor networks
Wireless sensor networks consist of a system of distributed sensors embedded in the physical world, and promise to allow observation of previously unobservable phenomena. Since th...
Ramakrishna Gummadi, Nupur Kothari, Todd D. Millst...
HPDC
2011
IEEE
12 years 9 months ago
Algorithm-based recovery for iterative methods without checkpointing
In today’s high performance computing practice, fail-stop failures are often tolerated by checkpointing. While checkpointing is a very general technique and can often be applied...
Zizhong Chen
SIGMETRICS
2004
ACM
131views Hardware» more  SIGMETRICS 2004»
13 years 11 months ago
Failure recovery for structured P2P networks: protocol design and performance evaluation
Measurement studies indicate a high rate of node dynamics in p2p systems. In this paper, we address the question of how high a rate of node dynamics can be supported by structured...
Simon S. Lam, Huaiyu Liu
IPPS
2010
IEEE
13 years 3 months ago
Scalable failure recovery for high-performance data aggregation
Many high-performance tools, applications and infrastructures, such as Paradyn, STAT, TAU, Ganglia, SuperMon, Astrolabe, Borealis, and MRNet, use data aggregation to synthesize lar...
Dorian C. Arnold, Barton P. Miller