Sciweavers

2400 search results - page 8 / 480
» Systems Failures
Sort
View
DSN
2006
IEEE
15 years 5 months ago
A large-scale study of failures in high-performance computing systems
Designing highly dependable systems requires a good understanding of failure characteristics. Unfortunately, little raw data on failures in large IT installations is publicly avai...
Bianca Schroeder, Garth A. Gibson
CORR
2006
Springer
80views Education» more  CORR 2006»
14 years 11 months ago
Exact Failure Frequency Calculations for Extended Systems
This paper shows how the steady-state availability and failure frequency can be calculated in a single pass for very large systems, when the availability is expressed as a product...
Annie Druault-Vicard, Christian Tanguy
ICPP
2007
IEEE
15 years 6 months ago
A Meta-Learning Failure Predictor for Blue Gene/L Systems
The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...
Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev T...
EDCC
2005
Springer
15 years 5 months ago
Failure Detection with Booting in Partially Synchronous Systems
Unreliable failure detectors are a well known means to enrich asynchronous distributed systems with time-free semantics that allow to solve consensus in the presence of crash failu...
Josef Widder, Gérard Le Lann, Ulrich Schmid
PODC
2009
ACM
15 years 4 months ago
The weakest failure detector for solving k-set agreement
A failure detector is a distributed oracle that provides processes in a distributed system with hints about failures. The notion of a weakest failure detector captures the exact a...
Eli Gafni, Petr Kuznetsov