Sciweavers

EDCC
2008
Springer

A Distributed Approach to Autonomous Fault Treatment in Spread

13 years 6 months ago
A Distributed Approach to Autonomous Fault Treatment in Spread
This paper presents the design and implementation of the Distributed Autonomous Replication Management (DARM) framework built on top of the Spread group communication system. The objective of DARM is to improve the dependability characteristics of systems through a fault treatment mechanism. Unlike many existing fault tolerance frameworks, DARM focuses on deployment and operational aspects, where the gain in terms of improved dependability is likely to be the greatest. DARM is novel in that recovery decisions are distributed to each individual group deployed in the system, eliminating the need for a centralized manager with global information about all groups. This scheme allows groups to perform fault treatment on themselves. A group leader in each group is responsible for fault treatment by means of replacing failed group members; the approach also tolerates failure of the group leader. The advantages of the distributed approach is: (i) no need to maintain globally centralized infor...
Hein Meling, Joakim L. Gilje
Added 19 Oct 2010
Updated 19 Oct 2010
Type Conference
Year 2008
Where EDCC
Authors Hein Meling, Joakim L. Gilje
Comments (0)