The overhead of consensus failure recovery

13 years 5 months ago
The overhead of consensus failure recovery
Abstract Many reliable distributed systems are consensusbased and typically operate under two modes: a fast normal mode in failure-free synchronous periods, and a slower recovery mode following asynchrony and failures. A lot of work has been devoted to optimize the normal mode, but little has focused on optimizing the recovery mode. This paper seeks to understand whether the recovery mode is inherently slower than the normal mode. In particular, we consider consensus algorithms in the round-based eventually synchronous model of [11], where t out of n processes may fail by crashing, messages may be lost, and the system may be asynchronous for arbitrarily long, but eventually the system becomes synchronous and no new failure occurs (we say that the system becomes stable). For t ≥ n/3, we prove a lower bound of three rounds for achieving a global decision whenever the system becomes stable, and we contrast this with a bound of two rounds when t < n/3. We then give matching algorithms...
Partha Dutta, Rachid Guerraoui, Idit Keidar
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2007
Where DC
Authors Partha Dutta, Rachid Guerraoui, Idit Keidar
Comments (0)