Sciweavers

11 search results - page 2 / 3
» A Meta-Learning Failure Predictor for Blue Gene L Systems
Sort
View
ICPPW
2008
IEEE
14 years 20 days ago
Simulating Failures on Large-Scale Systems
—Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene...
Narayan Desai, Ewing L. Lusk, Daniel Buettner, And...
DSN
2009
IEEE
14 years 29 days ago
System log pre-processing to improve failure prediction
Log preprocessing, a process applied on the raw log before applying a predictive method, is of paramount importance to failure prediction and diagnosis. While existing filtering ...
Ziming Zheng, Zhiling Lan, Byung-Hoon Park, Al Gei...
ICCS
2005
Springer
13 years 11 months ago
Super-Scalable Algorithms for Computing on 100, 000 Processors
In the next five years, the number of processors in high-end systems for scientific computing is expected to rise to tens and even hundreds of thousands. For example, the IBM Blu...
Christian Engelmann, Al Geist
IPPS
2010
IEEE
13 years 3 months ago
Scalable parallel I/O alternatives for massively parallel partitioned solver systems
Abstract--With the development of high-performance computing, I/O issues have become the bottleneck for many massively parallel applications. This paper investigates scalable paral...
Jing Fu, Ning Liu, Onkar Sahni, Kenneth E. Jansen,...
DSN
2007
IEEE
14 years 17 days ago
What Supercomputers Say: A Study of Five System Logs
If we hope to automatically detect and diagnose failures in large-scale computer systems, we must study real deployed systems and the data they generate. Progress has been hampere...
Adam J. Oliner, Jon Stearley