Sciweavers

668 search results - page 1 / 134
» Implementing and Evaluating Automatic Checkpointing
Sort
View
IPPS
2007
IEEE
13 years 11 months ago
Implementing and Evaluating Automatic Checkpointing
As the size and popularity of computer clusters go on growing, fault tolerance is becoming a crucial factor to ensure high performance and reliability for applications. To provide...
Antonio S. Martins, Ronaldo Augusto Lara Gon&ccedi...
IPPS
2006
IEEE
13 years 11 months ago
Recent advances in checkpoint/recovery systems
Checkpoint and Recovery (CPR) systems have many uses in high-performance computing. Because of this, many developers have implemented it, by hand, into their applications. One of ...
Greg Bronevetsky, Rohit Fernandes, Daniel Marques,...
CORR
2006
Springer
98views Education» more  CORR 2006»
13 years 5 months ago
Enabling user-driven Checkpointing strategies in Reverse-mode Automatic Differentiation
This paper presents a new functionality of the Automatic Differentiation (AD) Tool tapenade. tapenade generates adjoint codes which are widely used for optimization or inverse prob...
Laurent Hascoët, Mauricio Araya-Polo
SRDS
1999
IEEE
13 years 9 months ago
An Adaptive Checkpointing Protocol to Bound Recovery Time with Message Logging
Numerous mathematical approaches have been proposed to determine the optimal checkpoint interval for minimizing total execution time of an application in the presence of failures....
Kuo-Feng Ssu, Bin Yao, W. Kent Fuchs