Peer-to-peer systems enable efficient resource aggregation and are inherently scalable since they do not depend on any centralized authority. However, lack of a centralized autho...
We initiate the study of error confinement in distributed applications, where the goal is that only nodes that were directly hit by a fault may deviate from their correct external...
In this paper, we consider the problem of supporting fault tolerance for adaptive and time-critical applications in heterogeneous and unreliable grid computing environments. Our g...
This paper describes a research project about robot control across a computing Grid, first step toward a Grid solution for generic process control. A computational Grid can signi...
Fault tolerant algorithms are often designed under the t-out-of-n assumption, which is based on the assumption that all processes or components fail independently with equal proba...