Sciweavers

252 search results - page 4 / 51
» An Index-Based Checkpointing Algorithm for Autonomous Distri...
Sort
View
SRDS
1999
IEEE
15 years 1 months ago
An Adaptive Checkpointing Protocol to Bound Recovery Time with Message Logging
Numerous mathematical approaches have been proposed to determine the optimal checkpoint interval for minimizing total execution time of an application in the presence of failures....
Kuo-Feng Ssu, Bin Yao, W. Kent Fuchs
GRID
2004
Springer
15 years 2 months ago
Checkpoint and Restart for Distributed Components in XCAT3
With the advent of Grid computing, more and more highend computational resources become available for use to a scientist. While this opens up new avenues for scientific research,...
Sriram Krishnan, Dennis Gannon
101
Voted
ICDCS
2005
IEEE
15 years 3 months ago
Optimal Asynchronous Garbage Collection for RDT Checkpointing Protocols
Communication-induced checkpointing protocols that ensure rollback-dependency trackability (RDT) guarantee important properties to the recovery system without explicit coordinatio...
Rodrigo Schmidt, Islene C. Garcia, Fernando Pedone...
ICS
2004
Tsinghua U.
15 years 2 months ago
Adaptive incremental checkpointing for massively parallel systems
Given the scale of massively parallel systems, occurrence of faults is no longer an exception but a regular event. Periodic checkpointing is becoming increasingly important in the...
Saurabh Agarwal, Rahul Garg, Meeta Sharma Gupta, J...
IPPS
2005
IEEE
15 years 3 months ago
Current Practice and a Direction Forward in Checkpoint/Restart Implementations for Fault Tolerance
Checkpoint/restart is a general idea for which particular implementations enable various functionalities in computer systems, including process migration, gang scheduling, hiberna...
José Carlos Sancho, Fabrizio Petrini, Kei D...