Sciweavers

4 search results - page 1 / 1
» Evaluating cooperative checkpointing for supercomputing syst...
Sort
View
IPPS
2006
IEEE
13 years 11 months ago
Evaluating cooperative checkpointing for supercomputing systems
Cooperative checkpointing, in which the system dynamically skips checkpoints requested by applications at runtime, can exploit system-level information to improve performance and ...
Adam J. Oliner, Ramendra K. Sahoo
DSN
2005
IEEE
13 years 10 months ago
Probabilistic QoS Guarantees for Supercomputing Systems
Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the ...
Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo, ...
ICDCS
2011
IEEE
12 years 4 months ago
Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines
—Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to im...
Ramya Prabhakar, Sudharshan S. Vazhkudai, Youngjae...
PVM
2005
Springer
13 years 10 months ago
Cooperative Write-Behind Data Buffering for MPI I/O
Many large-scale production parallel programs often run for a very long time and require data checkpoint periodically to save the state of the computation for program restart and/o...
Wei-keng Liao, Kenin Coloma, Alok N. Choudhary, Le...