This paper describes an experiment performed on Wide Area Network to assess and fairly compare the Quality of Service provided by a large family of failure detectors. Failure dete...
Designs for distributed systems must consider the possibility that failures will arise and must adopt specific failure detection strategies. We describe and analyze a self-regulat...
Kevin Mills, Scott Rose, Stephen Quirolgico, M. Br...
Robust distributed systems commonly employ high-level recovery mechanisms enabling the system to recover from a wide variety of problematic environmental conditions such as node f...
Charles Edwin Killian, Karthik Nagaraj, Salman Per...
The current interdomain routing protocol, BGP, is not resilient to a path failure due to its single-path and slowlyconverging route calculation. This paper proposes a novel approa...
Crash recovery in database systems aims to provide an acceptable level of protection from failure at a given engineering cost. A large number of recovery mechanisms are known, and...
S. Scheuerl, Richard C. H. Connor, Ronald Morrison...