Automated addition of fault-tolerance to existing programs is highly desirable, as it allows the designer to focus on the system behavior in the absence of faults and leave the fa...
In this paper, we present a study on the fault tolerance nature of the island model when applied to Genetic Algorithms. Parallel and distributed models have been extensively appli...
Some safety-critical distributed embedded systems may need to use centralized components to achieve certain dependability properties. The difficulty in combining centralized and d...
: The distributed recovery block (DRB) scheme is a widely applicable approach for realizing both hardware and software fault tolerance in real-time distributed and parallel compute...
Initial versions of MPI were designed to work efficiently on multi-processors which had very little job control and thus static process models. Subsequently forcing them to suppor...