MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-...
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, ...
Parallel dataflow programming frameworks such as Map-Reduce are increasingly being used for large scale data analysis on computing clouds. It is therefore becoming important to a...
To improve data availability and resilience MapReduce frameworks use file systems that replicate data uniformly. However, analysis of job logs from a large production cluster show...
We present an automatic skew mitigation approach for userdefined MapReduce programs and present SkewTune, a system that implements this approach as a drop-in replacement for an e...
YongChul Kwon, Magdalena Balazinska, Bill Howe, Je...
This paper addresses the problem of scheduling concurrent jobs on clusters where application data is stored on the computing nodes. This setting, in which scheduling computations ...
Michael Isard, Vijayan Prabhakaran, Jon Currey, Ud...