ing Abstraction to Improve Fault Tolerance MIGUEL CASTRO Microsoft Research and RODRIGO RODRIGUES and BARBARA LISKOV MIT Laboratory for Computer Science Software errors are a major...
Replication is widely used to improve fault tolerance in distributed and multi-agent systems. In this paper, we present a different point of view on replication in multi-agent syst...
Currently, some coarse measures like global network latency are used to compare routing protocols. These measures do not provide enough insight of traffic distribution among networ...
Abstract—This paper describes a method to implement faulttolerant services in distributed systems based on the idea of fused state machines. The theory of fused state machines us...
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...