A Failure-Aware Scheduling Strategy in Large-Scale Cluster System

15 years 5 months ago

Download www.ncic.ac.cn

As the scale is expanding, node failure becomes a commonplace feature of large-scale cluster systems. As an important part of cluster operating system software, job scheduling takes charge with high efficient resource management and reasonable job scheduling. The function of job scheduling in cluster is divided into two sub-parts: job selection and node allocation. In this paper, we introduce a failure-aware scheduling strategy named LUNF (Longest Uptime Node First) node allocation policy using characterization of nodes' failure. Simulation results show that LUNF policy do better than random node allocation policy for the system performance.

Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bib

Real-time Traffic

CCGRID 2006 | Cluster Computing | Job Scheduling | Node Allocation | Node Allocation Policy |

claim paper

» FailureAware Construction and Reconfiguration of Distributed Virtual Machines for High Ava...

» Design and Analysis of a Dynamic Scheduling Strategy with Resource Estimation for LargeSca...

» A Hybrid RealTime Scheduling Approach for LargeScale Multicore Platforms

» Monitoring and Debugging Parallel Software with BCSMPI on LargeScale Clusters

» Performance Analysis of Grid DAG Scheduling Algorithms using MONARC Simulation Tool

» An EnergyEfficient Framework for LargeScale Parallel Storage Systems

» Network coding for large scale content distribution

» A Robust Scheduling Strategy for Moldable Scheduling of Parallel Jobs

Post Info
More Details (n/a)

Added	10 Jun 2010
Updated	10 Jun 2010
Type	Conference
Year	2006
Where	CCGRID
Authors	Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bibo Tu

Comments (0)

Sciweavers

A Failure-Aware Scheduling Strategy in Large-Scale Cluster System

CCGRID 2006 | Cluster Computing | Job Scheduling | Node Allocation | Node Allocation Policy |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers