Sciweavers

ICAC
2009
IEEE

Ranking the importance of alerts for problem determination in large computer systems

13 years 10 months ago
Ranking the importance of alerts for problem determination in large computer systems
The complexity of large computer systems has raised unprecedented challenges for system management. In practice, operators often collect large volume of monitoring data from system components and set up many rules to check data and trigger alerts. However, the alerts from various rules usually have different problem reporting accuracy because their thresholds are often manually set based on operators’ experience and intuition. Meantime, due to system dependencies, a single problem may trigger many alerts at the same time in large systems and the critical question is which alert should be analyzed first in the following problem determination process. In this paper, we propose a novel peer review mechanism to rank the importance of alerts and the top ranked alerts are more likely to be true positives. After comparing a metric value against its threshold to generate alerts, we also compare the value with the equivalent thresholds from many other rules to determine the importance of ale...
Guofei Jiang, Haifeng Chen, Kenji Yoshihira, Akhil
Added 21 May 2010
Updated 21 May 2010
Type Conference
Year 2009
Where ICAC
Authors Guofei Jiang, Haifeng Chen, Kenji Yoshihira, Akhilesh Saxena
Comments (0)