Sciweavers

7 search results - page 1 / 2
» CPROB: Checkpoint Processing with Opportunistic Minimal Reco...
Sort
View
IEEEPACT
2009
IEEE
13 years 11 months ago
CPROB: Checkpoint Processing with Opportunistic Minimal Recovery
—CPR (Checkpoint Processing and Recovery) is a physical register management scheme that supports a larger instruction window and higher average IPC than conventional ROB-style re...
Andrew D. Hilton, Neeraj Eswaran, Amir Roth
SRDS
1999
IEEE
13 years 9 months ago
An Adaptive Checkpointing Protocol to Bound Recovery Time with Message Logging
Numerous mathematical approaches have been proposed to determine the optimal checkpoint interval for minimizing total execution time of an application in the presence of failures....
Kuo-Feng Ssu, Bin Yao, W. Kent Fuchs
TPDS
1998
135views more  TPDS 1998»
13 years 4 months ago
On Coordinated Checkpointing in Distributed Systems
—Coordinated checkpointing simplifies failure recovery and eliminates domino effects in case of failures by preserving a consistent global checkpoint on stable storage. However, ...
Guohong Cao, Mukesh Singhal
SAC
2006
ACM
13 years 10 months ago
Adaptive page-level incremental checkpointing based on expected recovery time
Incremental checkpointing, which is intended to minimize checkpointing overhead, saves only the modified pages of a process. This means that in incremental checkpointing, the time...
Sangho Yi, Junyoung Heo, Yookun Cho, Jiman Hong
ICDCS
2012
IEEE
11 years 7 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...