Continuously shrinking feature sizes result in an increasing susceptibility of circuits to transient faults, e.g. due to environmental radiation. Approaches to implement fault tol...
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Most application level fault tolerance schemes in literature are non-adaptive in the sense that the fault tolerance schemes incorporated in applications are usually designed witho...
Zizhong Chen, Ming Yang, Guillermo A. Francia III,...
Domain-partition (DP) model is a general model for reliability maximization problem under given redundancy. In this paper, an improved DP model is used to formulate a reconfigurati...
A novel approach to hardware fault tolerance is demonstrated that takes inspiration from the human immune system as a method of fault detection. The human immune system is a remark...