Sciweavers

90 search results - page 3 / 18
» System log pre-processing to improve failure prediction
Sort
View
CCGRID
2006
IEEE
13 years 11 months ago
Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing
As the scale of cluster computing grows, it is becoming hard for long-running applications to complete without facing failures on large-scale clusters. To address this issue, chec...
Yawei Li, Zhiling Lan
DSN
2005
IEEE
13 years 11 months ago
Probabilistic QoS Guarantees for Supercomputing Systems
Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the ...
Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo, ...
ICPP
2007
IEEE
13 years 11 months ago
A Meta-Learning Failure Predictor for Blue Gene/L Systems
The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...
Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev T...
ICPP
2008
IEEE
13 years 12 months ago
Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A Case Study
Despite great efforts on the design of ultra-reliable components, the increase of system size and complexity has outpaced the improvement of component reliability. As a result, fa...
Jiexing Gu, Ziming Zheng, Zhiling Lan, John White,...
DSN
2006
IEEE
13 years 11 months ago
BlueGene/L Failure Analysis and Prediction Models
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM’s BlueGene/L which can acc...
Yinglung Liang, Yanyong Zhang, Anand Sivasubramani...