Enhancing MapReduce via Asynchronous Data Processing

9 years 1 months ago
Enhancing MapReduce via Asynchronous Data Processing
The MapReduce programming model simplifies large-scale data processing on commodity clusters by having users specify a map function that processes input key/value pairs to generate intermediate key/value pairs, and a reduce function that merges and converts intermediate key/value pairs into final results. Typical MapReduce implementations such as Hadoop enforce barrier synchronization between the map and reduce phases, i.e., the reduce phase does not start until all map tasks are finished. In turn, this synchronization requirement can cause inefficient utilization of computing resources and can adversely impact performance. Thus, we present and evaluate two different approaches to cope with the synchronization drawback of existing MapReduce implementations. The first approach, hierarchical reduction, starts a reduce task as soon as a predefined number of map tasks completes; it then aggregates the results of different reduce tasks following a tree structure. The second approach, increm...
Marwa Elteir, Heshan Lin, Wu-chun Feng
Added 12 Feb 2011
Updated 12 Feb 2011
Type Journal
Year 2010
Authors Marwa Elteir, Heshan Lin, Wu-chun Feng
Comments (0)