Sciweavers

ICS
2004
Tsinghua U.

Adaptive incremental checkpointing for massively parallel systems

13 years 10 months ago
Adaptive incremental checkpointing for massively parallel systems
Given the scale of massively parallel systems, occurrence of faults is no longer an exception but a regular event. Periodic checkpointing is becoming increasingly important in these systems. However, huge memory footprints of parallel applications place severe limitations on scalability of normal checkpointing techniques. Incremental checkpointing is a well researched technique that addresses scalability concerns, but most of the implementations require paging support from hardware and the underlying operating system, which may not be always available. In this paper, we propose a software based adaptive incremental checkpoint technique which uses a secure hash function to uniquely identify changed blocks in memory. Our algorithm is the first self-optimizing algorithm that dynamically computes the optimal block boundaries, based on the history of changed blocks. This provides better opportunities for minimizing checkpoint file size. Since the hash is computed in software, we do not n...
Saurabh Agarwal, Rahul Garg, Meeta Sharma Gupta, J
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where ICS
Authors Saurabh Agarwal, Rahul Garg, Meeta Sharma Gupta, José E. Moreira
Comments (0)