Fault tolerant systems based on the use of software design diversity may be able to achieve high levels of reliability more cost-effectively than other approaches, such as heroic ...
Abstract--To achieve a high product quality for nano-scale systems both realistic defect mechanisms and process variations must be taken into account. While existing approaches for...
The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating highly reliable distributed systems by leveraging commercially available personal compute...
Paul S. LeMahieu, Vasken Bohossian, Jehoshua Bruck
Fault tolerance is an important property of large-scale multiagent systems as the failure rate grows with both the number of the hosts and deployed agents, and the duration of com...
Most of recent research on distributed shared memory (DSM)systems have focused on either careful design of node controllersor cache coherenceprotocols. Whileevaluating these desig...