Sciweavers

361 search results - page 1 / 73
» Adaptive Fault Management of Parallel Applications for High-...
Sort
View
TC
2008
13 years 4 months ago
Adaptive Fault Management of Parallel Applications for High-Performance Computing
As the scale of high-performance computing (HPC) continues to grow, failure resilience of parallel applications becomes crucial. In this paper, we present FT-Pro, an adaptive fault...
Zhiling Lan, Yawei Li
PPOPP
2005
ACM
13 years 10 months ago
Fault tolerant high performance computing by a coding approach
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...
IPPS
2007
IEEE
13 years 10 months ago
Self Adaptive Application Level Fault Tolerance for Parallel and Distributed Computing
Most application level fault tolerance schemes in literature are non-adaptive in the sense that the fault tolerance schemes incorporated in applications are usually designed witho...
Zizhong Chen, Ming Yang, Guillermo A. Francia III,...
CORR
2008
Springer
134views Education» more  CORR 2008»
13 years 4 months ago
Algorithmic Based Fault Tolerance Applied to High Performance Computing
: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...
PDP
2008
IEEE
13 years 10 months ago
System-Level Virtualization for High Performance Computing
System-level virtualization has been a research topic since the 70’s but regained popularity during the past few years because of the availability of efficient solution such as...
Geoffroy Vallée, Thomas Naughton, Christian...