Sciweavers

EUROSYS
2011
ACM

Scarlett: coping with skewed content popularity in mapreduce clusters

12 years 7 months ago
Scarlett: coping with skewed content popularity in mapreduce clusters
To improve data availability and resilience MapReduce frameworks use file systems that replicate data uniformly. However, analysis of job logs from a large production cluster shows wide disparity in data popularity. Machines and racks storing popular content become bottlenecks; thereby increasing the completion times of jobs accessing this data even when there are machines with spare cycles in the cluster. To address this problem, we present Scarlett, a system that replicates blocks based on their popularity. By accurately predicting file popularity and working within hard bounds on additional storage, Scarlett causes minimal interference to running jobs. Trace driven simulations and experiments in two popular MapReduce frameworks (Hadoop and Dryad) show that Scarlett effectively alleviates hotspots and can speed up jobs by .. Categories and Subject Descriptors D.. [Operating Systems]: File Systems Management–Distributed file systems General Terms Algorithms, Measu...
Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth
Added 28 Aug 2011
Updated 28 Aug 2011
Type Journal
Year 2011
Where EUROSYS
Authors Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth Kandula, Albert G. Greenberg, Ion Stoica, Duke Harlan, Ed Harris
Comments (0)