Sciweavers

JCP
2006

Fault Tolerance in a Multi-Layered DRE System: A Case Study

13 years 5 months ago
Fault Tolerance in a Multi-Layered DRE System: A Case Study
Dynamic resource management is a crucial part of the infrastructure for emerging distributed real-time embedded systems, responsible for keeping mission-critical applications operating and allocating the resources necessary for them to meet their requirements. Because of this, the resource manager must be fault-tolerant, with nearly continuous operation. This paper describes our efforts to develop a fault-tolerant multi-layer dynamic resource management capability and the challenges we encountered, some due to the fault tolerance requirements we needed to meet and others due to characteristics of the resource management software. The challenges include the need for extremely rapid recovery; supporting the characteristics of component middleware, including peer-topeer communication and multi-tiered calling semantics; supporting multiple languages; and the co-existence of replicated and non-replicated elements. Making our multilayer dynamic resource manager fault-tolerant required simult...
Paul Rubel, Joseph P. Loyall, Richard E. Schantz,
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2006
Where JCP
Authors Paul Rubel, Joseph P. Loyall, Richard E. Schantz, Matthew Gillen
Comments (0)