Sciweavers

IPPS
2006
IEEE

A proactive fault-detection mechanism in large-scale cluster systems

13 years 10 months ago
A proactive fault-detection mechanism in large-scale cluster systems
To improve the whole dependability of large-scale cluster systems, an online fault detection mechanism is proposed in this paper. This mechanism can detect the fault in time before node fails and enables the proactive fault management. The proposed mechanism is summarized as follows: First, the dynamic characteristics of cluster system running in normal activity are built using Time Series Analysis methods. Second, the fault detection process is implemented by comparing the current running state of cluster system with normal running model. The fault alarm decision is made immediately when the current running state deviates the normal running model. The experiment results show that this mechanism can detect the fault in cluster system in good time.
Linping Wu, Dan Meng, Wen Gao, Jianfeng Zhan
Added 12 Jun 2010
Updated 12 Jun 2010
Type Conference
Year 2006
Where IPPS
Authors Linping Wu, Dan Meng, Wen Gao, Jianfeng Zhan
Comments (0)