Sciweavers

HPDC
2007
IEEE

Failure-aware checkpointing in fine-grained cycle sharing systems

13 years 11 months ago
Failure-aware checkpointing in fine-grained cycle sharing systems
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational resources available on the Internet. Such systems allow guest jobs to run on a host if they do not significantly impact the local users of the host. Since the hosts are typically provided voluntarily, their availability fluctuates greatly. To provide fault tolerance to guest jobs without adding significant computational overhead, we propose failure-aware checkpointing techniques that apply the knowledge of resource availability to select checkpoint repositories and to determine checkpoint intervals. We present the schemes of selecting reliable and efficient repositories from the non-dedicated hosts that contribute their disk storage. These schemes are formulated as 0/1 programming problems to optimize the network overhead of transferring checkpoints and the work lost due to unavailability of a storage host when needed to recover a guest job. We determine the checkpoint interval by comp...
Xiaojuan Ren, Rudolf Eigenmann, Saurabh Bagchi
Added 02 Jun 2010
Updated 02 Jun 2010
Type Conference
Year 2007
Where HPDC
Authors Xiaojuan Ren, Rudolf Eigenmann, Saurabh Bagchi
Comments (0)