In this paper a sufficient condition is given for minimal routing in n-dimensional (n-D) meshes with faulty nodes contained in a set of disjoint fault regions. It is based on an ...
Abstract--The high-performance computing domain is enriching with the inclusion of Networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face...
Samuel Rodrigo, Jose Flich, Antoni Roca, Simone Me...
Designing applications with timeliness requirements in environments of uncertain synchrony is known to be a difficult problem. In this paper, we follow the perspective of timing ...
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...
: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...