Sciweavers

SIGMOD
2011
ACM

A platform for scalable one-pass analytics using MapReduce

12 years 7 months ago
A platform for scalable one-pass analytics using MapReduce
Today’s one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efficiently. MapReduce is a popular programming model for processing large datasets using a cluster of machines. However, the traditional MapReduce model is not well-suited for one-pass analytics, since it is geared towards batch processing and requires the data set to be fully loaded into the cluster before running analytical queries. This paper examines, from a systems standpoint, what architectural design changes are necessary to bring the benefits of the MapReduce model to incremental one-pass analytics. Our empirical and theoretical analyses of Hadoop-based MapReduce systems show that the widely-used sort-merge implementation for partitioning and parallel processing poses a fundamental barrier to incremental one-pass analytics, despite various optimizations. To address these limitations, we propose a new data analysis platform that employs hash t...
Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGreg
Added 17 Sep 2011
Updated 17 Sep 2011
Type Journal
Year 2011
Where SIGMOD
Authors Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, Prashant J. Shenoy
Comments (0)