Sciweavers

ICDM
2009
IEEE

Online System Problem Detection by Mining Patterns of Console Logs

13 years 11 months ago
Online System Problem Detection by Mining Patterns of Console Logs
Abstract—We describe a novel application of using data mining and statistical learning methods to automatically monitor and detect abnormal execution traces from console logs in an online setting. Different from existing solutions, we use a two stage detection system. The first stage uses frequent pattern mining and distribution estimation techniques to capture the dominant patterns (both frequent sequences and time duration). The second stage use principal component analysis based anomaly detection technique to identify actual problems. Using real system data from a 203-node Hadoop [1] cluster, we show that we can not only achieve highly accurate and fast problem detection, but also help operators better understand execution patterns in their system. I. MOTIVATION AND OVERVIEW Internet services today often run in data centers consisting of thousands of servers. At these scales, non-failstop “performance failures” are common and may even indicate serious impending failures. Oper...
Wei Xu, Ling Huang, Armando Fox, David Patterson,
Added 23 May 2010
Updated 23 May 2010
Type Conference
Year 2009
Where ICDM
Authors Wei Xu, Ling Huang, Armando Fox, David Patterson, Michael I. Jordan
Comments (0)