The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating highly reliable distributed systems by leveraging commercially available personal compute...
Paul S. LeMahieu, Vasken Bohossian, Jehoshua Bruck
The implementation of imaging arrays for System-On-a-Chip (SOC) is aided by using faulttolerant light sensors. Fault-tolerant redundancy in an Active Pixel Sensor (APS) is obtaine...
Sunjaya Djaja, Glenn H. Chapman, Desmond Y. H. Che...
Distributed information systems are critical to the functioning of many businesses; designing them to be dependable is a challenging but important task. We report our experience i...
Jeremy Bryans, John S. Fitzgerald, Alexander Roman...
RPC is one of the programming models envisioned for the Grid. In Internet connected Large Scale Grids such as Desktop Grids, nodes and networks failures are not rare events. This ...
This paper presents an approach for integrating fault-tolerance techniques into microprocessors by utilizing instruction redundancy as well as time redundancy. Smaller and smaller...