Designing a distributed fault tolerance algorithm requires careful analysis of both fault models and diagnosis strategies. A system will fail if there are too many active faults, ...
The ability to guarantee that a system will continue to operate correctly under degraded conditions is key to the success of adopting multi-agent systems (MAS) as a paradigm for d...
We describe a novel approach for building a secure and fault tolerant data storage service in collaborative work environments, which uses perfect secret sharing schemes to store d...
This paper describes a highly available distributedvideo on demand (VoD) service which is inherently fault tolerant. The VoD service is provided by multiple servers that reside at...
In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...