Data Partitioning for Minimizing Transferred Data in MapReduce

11 years 11 months ago

Download hal-lirmm.ccsd.cnrs.fr

Reducing data transfer in MapReduce’s shuﬄe phase is very important because it increases data locality of reduce tasks, and thus decreases the overhead of job executions. In the literature, several optimizations have been proposed to reduce data transfer between mappers and reducers. Nevertheless, all these approaches are limited by how intermediate key-value pairs are distributed over map outputs. In this paper, we address the problem of high data transfers in MapReduce, and propose a technique that repartitions tuples of the input datasets, and thereby optimizes the distribution of key-values over mappers, and increases the data locality in reduce tasks. Our approach captures the relationships between input tuples and intermediate keys by monitoring the execution of a set of MapReduce jobs which are representative of the workload. Then, based on those relationships, it assigns input tuples to the appropriate chunks. We evaluated our approach through experimentation in a Hadoop de...

Miguel Liroz-Gistau, Reza Akbarinia, Divyakant Agr

Real-time Traffic

GLOBE 2013 | Information Technology |

claim paper

» YSmart Yet Another SQLtoMapReduce Translator

» MapReduce Simplified Data Processing on Large Clusters

» Llama leveraging columnar storage for scalable join processing in the MapReduce framework

» Osprey Implementing MapReduceStyle Fault Tolerance in a SharedNothing Distributed Database

» A platform for scalable onepass analytics using MapReduce

» Evaluating MapReduce for Multicore and Multiprocessor Systems

» Processing thetajoins using MapReduce

» An experience report on scaling tools for mining software repositories using MapReduce

» Mapreducemerge simplified relational data processing on large clusters

Post Info
More Details (n/a)

Added	28 Apr 2014
Updated	28 Apr 2014
Type	Journal
Year	2013
Where	GLOBE
Authors	Miguel Liroz-Gistau, Reza Akbarinia, Divyakant Agrawal, Esther Pacitti, Patrick Valduriez

Comments (0)

Sciweavers

Data Partitioning for Minimizing Transferred Data in MapReduce

GLOBE 2013 | Information Technology |

Explore & Download

Productivity Tools

Sciweavers