Sciweavers

NCA
2008
IEEE

Identifying Failures in Grids through Monitoring and Ranking

13 years 10 months ago
Identifying Failures in Grids through Monitoring and Ranking
In this paper we present FailRank, a novel framework for integrating and ranking information sources that characterize failures in a grid system. After the failing sites have been ranked, these can be eliminated from the job scheduling resource pool yielding in that way a more predictable, dependable and adaptive infrastructure. We also present the tools we developed towards evaluating the FailRank framework. In particular, we present the FailBase Repository which is a 38GB corpus of state information that characterizes the EGEE Grid for one month in 2007. Such a corpus paves the way for the community to systematically uncover new, previously unknown patterns and rules between the multitudes of parameters that can contribute to failures in a Grid environment. Additionally, we present an experimental evaluation study of the FailRank system over 30 days which shows that our framework identifies failures in 93% of the cases. We believe that our work constitutes another important step to...
Demetrios Zeinalipour-Yazti, Kyriakos Neocleous, C
Added 01 Jun 2010
Updated 01 Jun 2010
Type Conference
Year 2008
Where NCA
Authors Demetrios Zeinalipour-Yazti, Kyriakos Neocleous, Chryssis Georgiou, Marios D. Dikaiakos
Comments (0)