Sciweavers

ICPP
2002
IEEE

ART: Robustness of Meshes and Tori for Parallel and Distributed Computation

13 years 9 months ago
ART: Robustness of Meshes and Tori for Parallel and Distributed Computation
In this paper, we formulate the array robustness theorems (ARTs) for efficient computation and communication on faulty arrays. No hardware redundancy is required and no assumption is made about the availability of a complete submesh or subtorus. Based on ARTs, a very wide variety of problems, including sorting, FFT, total exchange, permutation, and some matrix operations, can be solved with a slowdown factor of 1 + o(1). The number of faults tolerated by ARTs ranges from o(min(n1;1 d n d n h)) for nary d-cubes with worst-case faults to as large as o(N) for most N-node 2-D meshes or tori with random faults, where h is the number of data items per processor. The resultant running times are the best results reported thus far for solving many problems on faulty arrays. Based on ARTs and several other components such as robust libraries, the priority emulation discipline, and X0Y0 routing, we introduce the robust adaptation interface layer (RAIL) as a middleware between ordinary algorithm...
Chi-Hsiang Yeh, Behrooz Parhami
Added 14 Jul 2010
Updated 14 Jul 2010
Type Conference
Year 2002
Where ICPP
Authors Chi-Hsiang Yeh, Behrooz Parhami
Comments (0)