: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...
In this paper we address the problem of reducing the energy consumption in distributed embedded systems associated with time-constraints and equipped with fault-tolerant technique...
The structural redundancy inherent to on-chip interconnection networks [networks on chip (NoC)] can be exploited by adaptive routing algorithms in order to provide connectivity eve...
Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to ana...
In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our s...
Christopher Yang, Christine Yen, Ceryen Tan, Samue...