Sciweavers

HPDC
2008
IEEE

Issues in applying data mining to grid job failure detection and diagnosis

13 years 11 months ago
Issues in applying data mining to grid job failure detection and diagnosis
As grid computation systems become larger and more complex, manually diagnosing failures in jobs becomes impractical. Recently, machine-learning techniques have been proposed to detect a variety of application failures in grids. While this is a promising approach, there are many options as to how to apply machine learning to this problem, and it not always obvious which approaches are feasible or effective. We explore some issues that arise when we try to apply existing implementations of data mining algorithms to diagnose as well as predict job failures in grids. We demonstrate that a) it is feasible to gather enough data in real-time to train useful classifier algorithms, using only a small fraction of the grid’s computational resources, b) it is important to choose the features used for classification with care, and c) it is useful to have both peruser and system-wide classifiers, as they diagnose different kinds of problems. We illustrate all these issues using a prototype sy...
Lakshmikant Shrinivas, Jeffrey F. Naughton
Added 29 May 2010
Updated 29 May 2010
Type Conference
Year 2008
Where HPDC
Authors Lakshmikant Shrinivas, Jeffrey F. Naughton
Comments (0)