Sciweavers

24 search results - page 1 / 5
» Analyzing Checkpointing Trends for Applications on the IBM B...
Sort
View
ICPPW
2009
IEEE
13 years 2 months ago
Analyzing Checkpointing Trends for Applications on the IBM Blue Gene/P System
Current petascale systems have tens of thousands of hardware components and complex system software stacks, which increase the probability of faults occurring during the lifetime ...
Harish Gapanati Naik, Rinku Gupta, Pete Beckman
IPPS
2006
IEEE
13 years 10 months ago
Cooperative checkpointing theory
Cooperative checkpointing uses global knowledge of the state and health of the machine to improve performance and reliability by dynamically deciding when to skip checkpoint reque...
Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo
IPPS
2005
IEEE
13 years 10 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
ICPP
2009
IEEE
13 years 11 months ago
Improving Resource Availability by Relaxing Network Allocation Constraints on Blue Gene/P
— High-end computing (HEC) systems have passed the petaflop barrier and continue to move toward the next frontier of exascale computing. As companies and research institutes con...
Narayan Desai, Darius Buntinas, Daniel Buettner, P...
DSN
2006
IEEE
13 years 10 months ago
BlueGene/L Failure Analysis and Prediction Models
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM’s BlueGene/L which can acc...
Yinglung Liang, Yanyong Zhang, Anand Sivasubramani...