Sciweavers

366 search results - page 1 / 74
» Algorithmic Based Fault Tolerance Applied to High Performanc...
Sort
View
CORR
2008
Springer
134views Education» more  CORR 2008»
14 years 10 months ago
Algorithmic Based Fault Tolerance Applied to High Performance Computing
: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...
111
Voted
AP2PS
2009
IEEE
15 years 1 months ago
Algorithm-Based Fault Tolerance Applied to P2P Computing Networks
—P2P computing platforms are subject to a wide range of attacks. In this paper, we propose a generalisation of the previous disk-less checkpointing approach for fault-tolerance i...
Thomas Roche, Mathieu Cunche, Jean-Louis Roch
SC
2005
ACM
15 years 3 months ago
Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers
We describe the software architecture, technical features, and performance of TICK (Transparent Incremental Checkpointer at Kernel level), a system-level checkpointer implemented ...
Roberto Gioiosa, José Carlos Sancho, Song J...
113
Voted
ICS
2011
Tsinghua U.
14 years 1 months ago
High performance linpack benchmark: a fault tolerant implementation without checkpointing
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...
Teresa Davies, Christer Karlsson, Hui Liu, Chong D...
77
Voted
PPOPP
2005
ACM
15 years 3 months ago
Fault tolerant high performance computing by a coding approach
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...