Sciweavers

63 search results - page 1 / 13
» Adaptive incremental checkpointing for massively parallel sy...
Sort
View
ICS
2004
Tsinghua U.
13 years 10 months ago
Adaptive incremental checkpointing for massively parallel systems
Given the scale of massively parallel systems, occurrence of faults is no longer an exception but a regular event. Periodic checkpointing is becoming increasingly important in the...
Saurabh Agarwal, Rahul Garg, Meeta Sharma Gupta, J...
SAC
2006
ACM
13 years 11 months ago
Adaptive page-level incremental checkpointing based on expected recovery time
Incremental checkpointing, which is intended to minimize checkpointing overhead, saves only the modified pages of a process. This means that in incremental checkpointing, the time...
Sangho Yi, Junyoung Heo, Yookun Cho, Jiman Hong
SC
2005
ACM
13 years 10 months ago
Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers
We describe the software architecture, technical features, and performance of TICK (Transparent Incremental Checkpointer at Kernel level), a system-level checkpointer implemented ...
Roberto Gioiosa, José Carlos Sancho, Song J...
ICPADS
2010
IEEE
13 years 2 months ago
Hybrid Checkpointing for MPI Jobs in HPC Environments
As the core count in high-performance computing systems keeps increasing, faults are becoming common place. Checkpointing addresses such faults but captures full process images ev...
Chao Wang, Frank Mueller, Christian Engelmann, Ste...
IPPS
2009
IEEE
13 years 11 months ago
Compiler-enhanced incremental checkpointing for OpenMP applications
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety ...
Greg Bronevetsky, Daniel Marques, Keshav Pingali, ...