In this paper, we present a new fault tolerance system called DejaVu for transparent and automatic checkpointing, migration, and recovery of parallel and distributed applications....
Joseph F. Ruscio, Michael A. Heffner, Srinidhi Var...
We present a transparent, system-level checkpointing solution for master-worker parallelism that automatically adapts, upon restart, to the number of processor nodes available. Th...
We present a formal approach to implement and certify fault-tolerance in real-time embedded systems. The faultintolerant initial system consists of a set of independent periodic t...
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Transiently powered computing devices such as RFID tags, kinetic energy harvesters, and smart cards typically rely on programs that complete a task under tight time constraints be...