Fault tolerance is a major concern to guarantee availability of critical services as well as application execution. Traditional approaches for fault tolerance include checkpoint/r...
The increasing role of communication networks in today’s society results in a demand for higher levels of network availability and reliability. At the same time, fault managemen...
Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...
Arun Babu Nagarajan, Frank Mueller, Christian Enge...
In component-based systems, fault-tolerance concerns are typically handled by manually programmed fault containers. The purpose of fault containers is to prevent error propagation...
Intrusion-tolerant replication enables the construction of systems that tolerate a finite number of malicious faults. An arbitrary number of faults can be tolerated during system ...