Sciweavers

ICPPW
2008
IEEE

Simulating Failures on Large-Scale Systems

13 years 11 months ago
Simulating Failures on Large-Scale Systems
—Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene/P systems implemented as a part of the Cobalt resource manager. The primary goal of this framework is to support system software development. We also present a hardware diagnostic system that we have implemented using this framework.
Narayan Desai, Ewing L. Lusk, Daniel Buettner, And
Added 30 May 2010
Updated 30 May 2010
Type Conference
Year 2008
Where ICPPW
Authors Narayan Desai, Ewing L. Lusk, Daniel Buettner, Andrew Cherry, Theron Voran
Comments (0)