Scarlett: coping with skewed content popularity in mapreduce clusters

14 years 8 months ago

Download eurosys2011.cs.uni-salzburg.at

To improve data availability and resilience MapReduce frameworks use file systems that replicate data uniformly. However, analysis of job logs from a large production cluster shows wide disparity in data popularity. Machines and racks storing popular content become bottlenecks; thereby increasing the completion times of jobs accessing this data even when there are machines with spare cycles in the cluster. To address this problem, we present Scarlett, a system that replicates blocks based on their popularity. By accurately predicting file popularity and working within hard bounds on additional storage, Scarlett causes minimal interference to running jobs. Trace driven simulations and experiments in two popular MapReduce frameworks (Hadoop and Dryad) show that Scarlett effectively alleviates hotspots and can speed up jobs by .. Categories and Subject Descriptors D.. [Operating Systems]: File Systems Management–Distributed file systems General Terms Algorithms, Measu...

Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth

Real-time Traffic

EUROSYS 2011 | Measurement Performance | Minimal Interference | Production Cluster | Software Engineering |

claim paper

Post Info
More Details (n/a)

Added	28 Aug 2011
Updated	28 Aug 2011
Type	Journal
Year	2011
Where	EUROSYS
Authors	Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth Kandula, Albert G. Greenberg, Ion Stoica, Duke Harlan, Ed Harris

Comments (0)

Sciweavers

Scarlett: coping with skewed content popularity in mapreduce clusters

EUROSYS 2011 | Measurement Performance | Minimal Interference | Production Cluster | Software Engineering |

Explore & Download

Productivity Tools

Sciweavers