This paper describes a new method for providingtransparent fault tolerance for parallel applications on a network of workstations. We have designed our method in the context of sh...
This work presents a new switching mechanism to tolerate arbitrary faults in interconnection networks with a negligible implementation cost. Although our routing technique can be ...
Byzantine fault tolerant systems behave correctly when no more than f out of 3f + 1 replicas fail. When there are more than f failures, traditional BFT protocols make no guarantee...
Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on mu...
Philippe Bergheaud, Dinesh Subhraveti, Marc Vertes
Building dependable distributed systems using ad hoc methods is a challenging task. Without proper support, an application programmer must face the daunting requirement of having ...
Jennifer Ren, Michel Cukier, Paul Rubel, William H...