Replication is a key strategy for improving locality, fault tolerance and availability in distributed systems. The paper focuses on distributed file systems and presents a system ...
Abstract. For large distributed systems built from inexpensive components, one expects to see incessant failures. This paper proposes two models for such faults and analyzes two we...
Replication is a key technique for improving fault tolerance. Replication can also improve application performance under some circumstances, but can have the opposite effect under...
As the size and popularity of computer clusters go on growing, fault tolerance is becoming a crucial factor to ensure high performance and reliability for applications. To provide...
Antonio S. Martins, Ronaldo Augusto Lara Gon&ccedi...
— We consider the problem of reliable broadcast in an infinite grid (or finite toroidal) radio network under Byzantine and crash-stop failures. We present bounds on the maximum...