The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...
Researchers have made great strides in improving the fault tolerance of both centralized and replicated systems against arbitrary (Byzantine) faults. However, there are hard limit...
Byung-Gon Chun, Petros Maniatis, Scott Shenker, Jo...
Present and future semiconductor technologies are characterized by increasing parameters variations as well as an increasing susceptibility to external disturbances. Transient err...
Self-stabilization is a versatile approach to fault-tolerance since it permits a distributed system to recover from any transient fault that arbitrarily corrupts the contents of a...
The general problem of verifying coherence for shared-memory multiprocessor executions is NP-Complete. Verifying memory consistency models is therefore NP-Hard, because memory con...