Enhancing MapReduce via Asynchronous Data Processing

15 years 1 months ago

Download synergy.cs.vt.edu

The MapReduce programming model simplifies large-scale data processing on commodity clusters by having users specify a map function that processes input key/value pairs to generate intermediate key/value pairs, and a reduce function that merges and converts intermediate key/value pairs into final results. Typical MapReduce implementations such as Hadoop enforce barrier synchronization between the map and reduce phases, i.e., the reduce phase does not start until all map tasks are finished. In turn, this synchronization requirement can cause inefficient utilization of computing resources and can adversely impact performance. Thus, we present and evaluate two different approaches to cope with the synchronization drawback of existing MapReduce implementations. The first approach, hierarchical reduction, starts a reduce task as soon as a predefined number of map tasks completes; it then aggregates the results of different reduce tasks following a tree structure. The second approach, increm...

Marwa Elteir, Heshan Lin, Wu-chun Feng

Real-time Traffic