DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wid...
We describe the software architecture, technical features, and performance of TICK (Transparent Incremental Checkpointer at Kernel level), a system-level checkpointer implemented ...
We present a transparent, system-level checkpointing solution for master-worker parallelism that automatically adapts, upon restart, to the number of processor nodes available. Th...
We have created ZapC, a novel system for transparent coordinated checkpoint-restart of distributed network applications on commodity clusters. ZapC provides a thin virtualization ...
Cycle-harvesting systems such as Condor have been developed to make desktop machines in a local area (which are often similar to clusters in hardware configuration) available as ...