Sciweavers

IPPS
2007
IEEE
13 years 11 months ago
Implementing and Evaluating Automatic Checkpointing
As the size and popularity of computer clusters go on growing, fault tolerance is becoming a crucial factor to ensure high performance and reliability for applications. To provide...
Antonio S. Martins, Ronaldo Augusto Lara Gon&ccedi...
EUROSYS
2009
ACM
14 years 2 months ago
Transparent checkpoints of closed distributed systems in Emulab
Emulab is a testbed for networked and distributed systems experimentation. Two guiding principles of its design are realism and control of experimentation. There is an inherent te...
Anton Burtsev, Prashanth Radhakrishnan, Mike Hible...