We propose an approach for analyzing software architectures with respect to reliability to improve fault tolerance. The approach defines a failure scenario model that is based on ...
A master/worker paradigm for executing large-scale parallel discrete event simulation programs over networkenabled computational resources is proposed and evaluated. In contrast t...
Fault tolerant algorithms are often designed under the t-out-of-n assumption, which is based on the assumption that all processes or components fail independently with equal proba...
The vulnerability of computer nodes due to component failures is a critical issue for cluster-based file systems. This paper studies the development and deployment of mirroring in...
Abstract. Lock-free shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lock-free data structures include ...