Sciweavers

CLUSTER
2005
IEEE

Minimizing the Network Overhead of Checkpointing in Cycle-harvesting Cluster Environments

13 years 10 months ago
Minimizing the Network Overhead of Checkpointing in Cycle-harvesting Cluster Environments
Cycle-harvesting systems such as Condor have been developed to make desktop machines in a local area (which are often similar to clusters in hardware configuration) available as a compute platform. To provide a dual-use capability, opportunistic jobs harvesting cycles from the desktop must be checkpointed before the desktop resources are reclaimed by their owners and the job is evacuated. In this paper, we investigate a new system for computing efficient checkpoint schedules in cycleharvesting environments. Our system records the historical availability from each resource and fits a statistical model to the observations. Because checkpointing must often traverse the network (i.e. the desktop hosts do not provide sufficient persistent storage for checkpoints), we combine this model with predictions of network performance to the storage site to compute a checkpoint schedule. When an application is initiated on a particular resource, the system uses the computed distribution to param...
Daniel Nurmi, John Brevik, Richard Wolski
Added 24 Jun 2010
Updated 24 Jun 2010
Type Conference
Year 2005
Where CLUSTER
Authors Daniel Nurmi, John Brevik, Richard Wolski
Comments (0)