Sciweavers

ICDM
2007
IEEE

Failure Prediction in IBM BlueGene/L Event Logs

13 years 11 months ago
Failure Prediction in IBM BlueGene/L Event Logs
Frequent failures are becoming a serious concern to the community of high-end computing, especially when the applications and the underlying systems rapidly grow in size and complexity. In order to develop effective fault-tolerant strategies, there is a critical need to predict failure events. To this end, we have collected detailed event logs from IBM BlueGene/L, which has 128K processors, and is currently the fastest supercomputer in the world. In this study, we first show how the event records can be converted into a data set that is appropriate for running classification techniques. Then we apply classifiers on the data, including RIPPER (a rule-based classifier), Support Vector Machines (SVMs), a traditional Nearest Neighbor method, and a customized Nearest Neighbor method. We show that the customized nearest neighbor approach can outperform RIPPER and SVMs in terms of both coverage and precision. The results suggest that the customized nearest neighbor approach can be used t...
Yinglung Liang, Yanyong Zhang, Hui Xiong, Ramendra
Added 03 Jun 2010
Updated 03 Jun 2010
Type Conference
Year 2007
Where ICDM
Authors Yinglung Liang, Yanyong Zhang, Hui Xiong, Ramendra K. Sahoo
Comments (0)