—Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene...
Narayan Desai, Ewing L. Lusk, Daniel Buettner, And...
In this paper, we describe disparity, a tool that does parallel, scalable anomaly detection for clusters. Disparity uses basic statistical methods and scalable reduction operation...
Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system....
Xingfu Wu, Valerie E. Taylor, Charles W. Lively, S...
Event traces are helpful in understanding the performance behavior of message-passing applications since they allow in-depth analyses of communication and synchronization patterns...
Daniel Becker, John C. Linford, Rolf Rabenseifner,...
Power minimization is a serious issue in wireless sensor networks to extend the lifetime and minimize costs. However, in order to gain an accurate understanding of issues regardin...