Sciweavers

32 search results - page 1 / 7
» Performance Implications of Failures in Large-Scale Cluster ...
Sort
View
JSSPP
2004
Springer
13 years 10 months ago
Performance Implications of Failures in Large-Scale Cluster Scheduling
As we continue to evolve into large-scale parallel systems, many of them employing hundreds of computing engines to take on mission-critical roles, it is crucial to design those s...
Yanyong Zhang, Mark S. Squillante, Anand Sivasubra...
IPPS
2005
IEEE
13 years 10 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
CCGRID
2006
IEEE
13 years 10 months ago
A Failure-Aware Scheduling Strategy in Large-Scale Cluster System
As the scale is expanding, node failure becomes a commonplace feature of large-scale cluster systems. As an important part of cluster operating system software, job scheduling tak...
Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bib...
ESCIENCE
2006
IEEE
13 years 10 months ago
Job Failure Analysis and Its Implications in a Large-Scale Production Grid
In this paper we present an initial analysis of job failures in a large-scale data-intensive Grid. Based on three representative periods in production, we characterize the interar...
Hui Li, David L. Groep, Lex Wolters, Jeffrey Templ...
ECRTS
2007
IEEE
13 years 11 months ago
A Hybrid Real-Time Scheduling Approach for Large-Scale Multicore Platforms
We propose a hybrid approach for scheduling real-time tasks on large-scale multicore platforms with hierarchical shared caches. In this approach, a multicore platform is partition...
John M. Calandrino, James H. Anderson, Dan P. Baum...