Sciweavers

3 search results - page 1 / 1
» Using Probabilistic Characterization to Reduce Runtime Fault...
Sort
View
CCGRID
2008
IEEE
13 years 6 months ago
Using Probabilistic Characterization to Reduce Runtime Faults in HPC Systems
Abstract--The current trend in high performance computing is to aggregate ever larger numbers of processing and interconnection elements in order to achieve desired levels of compu...
Jim M. Brandt, Bert J. Debusschere, Ann C. Gentile...
FGCS
2002
153views more  FGCS 2002»
13 years 4 months ago
HARNESS fault tolerant MPI design, usage and performance issues
Initial versions of MPI were designed to work efficiently on multi-processors which had very little job control and thus static process models. Subsequently forcing them to suppor...
Graham E. Fagg, Jack Dongarra
PASTE
2010
ACM
13 years 9 months ago
Learning universal probabilistic models for fault localization
Recently there has been significant interest in employing probabilistic techniques for fault localization. Using dynamic dependence information for multiple passing runs, learnin...
Min Feng, Rajiv Gupta