Sciweavers

CLUSTER
2005
IEEE

Transparent Checkpoint-Restart of Distributed Applications on Commodity Clusters

13 years 10 months ago
Transparent Checkpoint-Restart of Distributed Applications on Commodity Clusters
We have created ZapC, a novel system for transparent coordinated checkpoint-restart of distributed network applications on commodity clusters. ZapC provides a thin virtualization layer on top of the operating system that decouples a distributed application from dependencies on the cluster nodes on which it is executing. This decoupling enables ZapC to checkpoint an entire distributed application across all nodes in a coordinated manner such that it can be restarted from the checkpoint on a different set of cluster nodes at a later time. ZapC checkpoint-restart operations execute in parallel across different cluster nodes, providing faster checkpoint-restart performance. ZapC uniquely supports network state in a transport protocol independent manner, including correctly saving and restoring socket and protocol state for both TCP and UDP connections. We have implemented a ZapC Linux prototype and demonstrate that it provides low virtualization overhead and fast checkpointrestart times f...
Oren Laadan, Dan B. Phung, Jason Nieh
Added 24 Jun 2010
Updated 24 Jun 2010
Type Conference
Year 2005
Where CLUSTER
Authors Oren Laadan, Dan B. Phung, Jason Nieh
Comments (0)