Sciweavers

DSN
2005
IEEE

Probabilistic QoS Guarantees for Supercomputing Systems

13 years 10 months ago
Probabilistic QoS Guarantees for Supercomputing Systems
Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the system and users to negotiate a mutually desirable risk strategy; in order to accomplish this, the system makes probabilistic guarantees on quality of service (QoS), of the form, “Job j can be completed by deadline d with probability p.” In order to make such guarantees, the system uses event prediction (forecasting) in conjunction with fault-aware job scheduling and cooperative checkpointing strategies. Using job logs and failure traces from actual high performance computing systems, we employ trace-based simulations to assess the effects of the prediction accuracy (a) and user risk strategy (U) on a variety of performance metrics. Compared to a system that does not use event prediction, a high forecasting accuracy resulted in QoS and utilization improvements of as much as 6%, along with an 89% reduction...
Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo,
Added 24 Jun 2010
Updated 24 Jun 2010
Type Conference
Year 2005
Where DSN
Authors Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo, José E. Moreira, Manish Gupta
Comments (0)