Fault tolerance is a major concern to guarantee availability of critical services as well as application execution. Traditional approaches for fault tolerance include checkpoint/r...
: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...
In this paper we introduce Resizable Data Composer-Cache (RDC-Cache). This novel cache architecture operates correctly at sub 500 mV in 65 nm technology tolerating large number of...
Avesta Sasan, Houman Homayoun, Ahmed M. Eltawil, F...
Phoenix is a fault-tolerantreal-time network-attachedstorage device (NASD). Like other NASD architectures, Phoenix provides an object-based interface to data stored on network-att...
Abstract. In order to support the dependability analysis of a system under design in an early phase of the design process, so-called fault tolerance libraries can be created that c...