Sciweavers

1256 search results - page 12 / 252
» On Coordinated Checkpointing in Distributed Systems
Sort
View
CONCURRENCY
2007
57views more  CONCURRENCY 2007»
14 years 11 months ago
Performance and effectiveness trade-off for checkpointing in fault-tolerant distributed systems
Panagiotis Katsaros, Lefteris Angelis, Constantine...
ICS
2004
Tsinghua U.
15 years 5 months ago
Adaptive incremental checkpointing for massively parallel systems
Given the scale of massively parallel systems, occurrence of faults is no longer an exception but a regular event. Periodic checkpointing is becoming increasingly important in the...
Saurabh Agarwal, Rahul Garg, Meeta Sharma Gupta, J...
TROB
2002
244views more  TROB 2002»
14 years 11 months ago
Distributed surveillance and reconnaissance using multiple autonomous ATVs: CyberScout
The objective of the CyberScout project is to develop an autonomous surveillance and reconnaissance system using a network of all-terrain vehicles. In this paper, we focus on two f...
Mahesh Saptharishi, C. Spence Oliver, Christopher ...
IPPS
2005
IEEE
15 years 5 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
ISCA
2002
IEEE
115views Hardware» more  ISCA 2002»
15 years 4 months ago
SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery
We develop an availability solution, called SafetyNet, that uses a unified, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. At...
Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, ...