Sciweavers

ICDE
2012
IEEE

Load Balancing in MapReduce Based on Scalable Cardinality Estimates

11 years 6 months ago
Load Balancing in MapReduce Based on Scalable Cardinality Estimates
—MapReduce has emerged as a popular tool for distributed and scalable processing of massive data sets and is increasingly being used in e-science applications. Unfortunately, the performance of MapReduce systems strongly depends on an even data distribution, while scientific data sets are often highly skewed. The resulting load imbalance, which raises the processing time, is even amplified by the high runtime complexities of the reducer tasks. An adaptive load balancing strategy is required for appropriate skew handling. In this paper, we address the problem of estimating the cost of the tasks that are distributed to the reducers based on a given cost model. A realistic cost estimation is the basis for adaptive load balancing algorithms and requires to gather statistics from the mappers. This is challenging: (a) Since the statistics from all mappers must be integrated, the mapper statistics must be small. (b) Although each mapper sees only a small fraction of the data, the integrat...
Benjamin Gufler, Nikolaus Augsten, Angelika Reiser
Added 28 Sep 2012
Updated 28 Sep 2012
Type Journal
Year 2012
Where ICDE
Authors Benjamin Gufler, Nikolaus Augsten, Angelika Reiser, Alfons Kemper
Comments (0)