Sciweavers

73 search results - page 6 / 15
» Tiresias: Black-Box Failure Prediction in Distributed System...
Sort
View
HPDC
2008
IEEE
15 years 4 months ago
Issues in applying data mining to grid job failure detection and diagnosis
As grid computation systems become larger and more complex, manually diagnosing failures in jobs becomes impractical. Recently, machine-learning techniques have been proposed to d...
Lakshmikant Shrinivas, Jeffrey F. Naughton
FAST
2007
14 years 11 months ago
Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You?
Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million. In this paper, we presen...
Bianca Schroeder, Garth A. Gibson
TROB
2010
129views more  TROB 2010»
14 years 8 months ago
A Probabilistic Particle-Control Approximation of Chance-Constrained Stochastic Predictive Control
—Robotic systems need to be able to plan control actions that are robust to the inherent uncertainty in the real world. This uncertainty arises due to uncertain state estimation,...
Lars Blackmore, Masahiro Ono, Askar Bektassov, Bri...
ICDCS
2012
IEEE
13 years 1 days ago
PREPARE: Predictive Performance Anomaly Prevention for Virtualized Cloud Systems
Abstract—Virtualized cloud systems are prone to performance anomalies due to various reasons such as resource contentions, software bugs, and hardware failures. In this paper, we...
Yongmin Tan, Hiep Nguyen, Zhiming Shen, Xiaohui Gu...
KDD
2005
ACM
178views Data Mining» more  KDD 2005»
15 years 3 months ago
Failure detection and localization in component based systems by online tracking
The increasing complexity of today’s systems makes fast and accurate failure detection essential for their use in mission-critical applications. Various monitoring methods provi...
Haifeng Chen, Guofei Jiang, Cristian Ungureanu, Ke...