Sciweavers

PPOPP
2006
ACM

Predicting bounds on queuing delay for batch-scheduled parallel machines

13 years 9 months ago
Predicting bounds on queuing delay for batch-scheduled parallel machines
Most space-sharing parallel computers presently operated by high-performance computing centers use batch-queuing systems to manage processor allocation. In many cases, users wishing to use these batch-queued resources have accounts at multiple sites and have the option of choosing at which site or sites to submit a parallel job. In such a situation, the amount of time a user’s job will wait in any one batch queue can significantly impact the overall time a user waits from job submission to job completion. In this work, we explore a new method for providing end-users with predictions for the bounds on the queuing delay individual jobs will experience. We evaluate this method using batch scheduler logs for distributed-memory parallel machines that cover a 9-year period at 7 large HPC centers. Our results show that it is possible to predict delay bounds reliably for jobs in different queues, and for jobs requesting different ranges of processor counts. Using this information, scienti...
John Brevik, Daniel Nurmi, Richard Wolski
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where PPOPP
Authors John Brevik, Daniel Nurmi, Richard Wolski
Comments (0)