Packing the most onto your cloud

12 years 8 months ago
Packing the most onto your cloud
Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to automatically optimize the performance of these frameworks. In this paper, we deal with one particular optimization problem, namely scheduling sets of Map-Reduce jobs on a cluster of machines. We present a scheduler that takes job characteristics into account and finds a schedule that minimizes the total completion time of the set of jobs. Our scheduler decides on the number of machines to assign to each job, and it tries to pack as many jobs on the machines as the machine resources can support. To enable flexible assignment of jobs onto machines, we run the Map-Reduce jobs in virtual machines. Our scheduling problem is formulated as a constrained optimization problem, and we experimentally demonstrate using the Hadoop open source Map-Reduce implementation that the solution to this problem results in benefi...
Ashraf Aboulnaga, Ziyu Wang, Zi Ye Zhang
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where CIKM
Authors Ashraf Aboulnaga, Ziyu Wang, Zi Ye Zhang
Comments (0)