Abstract. In order to support the dependability analysis of a system under design in an early phase of the design process, so-called fault tolerance libraries can be created that c...
Aspect-oriented modeling is proposed to design the architecture of fault tolerant systems. Notations are introduced that support the separate and modularized design of functional ...
We observe increasing interest in aggregating geographically distributed, heterogeneous resources to perform large scale computations. MPI remains the most popular programming par...
We describe the communication infrastructure (CI) for our fault-tolerant cluster middleware, which is optimized for two classes of communication: for the applications and for the ...
Ming Li, Wenchao Tao, Daniel Goldberg, Israel Hsu,...
Designing a distributed fault tolerance algorithm requires careful analysis of both fault models and diagnosis strategies. A system will fail if there are too many active faults, ...