Sciweavers

9 search results - page 2 / 2
» Filtering Failure Logs for a BlueGene L Prototype
Sort
View
IPPS
2005
IEEE
13 years 11 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
DSN
2009
IEEE
14 years 2 days ago
System log pre-processing to improve failure prediction
Log preprocessing, a process applied on the raw log before applying a predictive method, is of paramount importance to failure prediction and diagnosis. While existing filtering ...
Ziming Zheng, Zhiling Lan, Byung-Hoon Park, Al Gei...
DSN
2007
IEEE
13 years 11 months ago
What Supercomputers Say: A Study of Five System Logs
If we hope to automatically detect and diagnose failures in large-scale computer systems, we must study real deployed systems and the data they generate. Progress has been hampere...
Adam J. Oliner, Jon Stearley
IPPS
2006
IEEE
13 years 11 months ago
Cooperative checkpointing theory
Cooperative checkpointing uses global knowledge of the state and health of the machine to improve performance and reliability by dynamically deciding when to skip checkpoint reque...
Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo