In this paper we present a new fault tolerant clock synchronization algorithm called the Fault Tolerant Daisy Chain algorithm. It is intended for internal clock synchronization of...
Abstract--We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tr...
James Dinan, Arjun Singri, P. Sadayappan, Sriram K...
As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, ...
In this paper we study the computation error tolerance properties of motion estimation algorithms. We are motivated by two scenarios where hardware systems may introduce computati...
A case study of performance and dependability evaluation of fault-tolerant multiprocessors is presented. Two specific architectures are analyzed taking into account system functio...