Sciweavers

668 search results - page 2 / 134
» Implementing and Evaluating Automatic Checkpointing
Sort
View
IPPS
2007
IEEE
14 years 22 days ago
DejaVu: Transparent User-Level Checkpointing, Migration, and Recovery for Distributed Systems
In this paper, we present a new fault tolerance system called DejaVu for transparent and automatic checkpointing, migration, and recovery of parallel and distributed applications....
Joseph F. Ruscio, Michael A. Heffner, Srinidhi Var...
CCGRID
2006
IEEE
14 years 15 days ago
Transparent Adaptive Library-Based Checkpointing for Master-Worker Style Parallelism
We present a transparent, system-level checkpointing solution for master-worker parallelism that automatically adapts, upon restart, to the number of processor nodes available. Th...
Gene Cooperman, Jason Ansel, Xiaoqin Ma
EMSOFT
2006
Springer
13 years 10 months ago
Implementing fault-tolerance in real-time systems by automatic program transformations
We present a formal approach to implement and certify fault-tolerance in real-time embedded systems. The faultintolerant initial system consists of a set of independent periodic t...
Tolga Ayav, Pascal Fradet, Alain Girault
CLUSTER
2003
IEEE
13 years 11 months ago
Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Aurelien Bouteiller, Pierre Lemarinier, Gér...
ASPLOS
2011
ACM
12 years 10 months ago
Mementos: system support for long-running computation on RFID-scale devices
Transiently powered computing devices such as RFID tags, kinetic energy harvesters, and smart cards typically rely on programs that complete a task under tight time constraints be...
Benjamin Ransford, Jacob Sorber, Kevin Fu