HaLoop: Efficient Iterative Data Processing on Large Clusters

15 years 2 months ago

Download www.cs.washington.edu

The growing demand for large-scale data mining and data analysis applications has led both industry and academia to design new types of highly scalable data-intensive computing platforms. MapReduce and Dryad are two popular platforms in which the dataflow takes the form of a directed acyclic graph of operators. These platforms lack built-in support for iterative programs, which arise naturally in many applications including data mining, web ranking, graph analysis, model fitting, and so on. This paper presents HaLoop, a modified version of the Hadoop MapReduce framework that is designed to serve these applications. HaLoop not only extends MapReduce with programming support for iterative applications, it also dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching mechanisms. We evaluated HaLoop on real queries and real datasets. Compared with Hadoop, on average, HaLoop reduces

Yingyi Bu, Bill Howe, Magdalena Balazinska, Michae

Real-time Traffic

Data Mining | Data-intensive Computing Platforms | Directed Acyclic Graph | PVLDB 2010 |

claim paper

» Graphics Hardware based Efficient and Scalable Fuzzy CMeans Clustering

» BIRCH An Efficient Data Clustering Method for Very Large Databases

» IMDC An ImageMapped Data Clustering Technique for Large Datasets

» Mapreducemerge simplified relational data processing on large clusters

» A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining

» A highly efficient multicore algorithm for clustering extremely large datasets

» SCOPE easy and efficient parallel processing of massive data sets

» Swift Scalable weighted iterative sampling for flow cytometry clustering

Post Info
More Details (n/a)

Added	20 May 2011
Updated	20 May 2011
Type	Journal
Year	2010
Where	PVLDB
Authors	Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael Ernst

Comments (0)

Sciweavers

HaLoop: Efficient Iterative Data Processing on Large Clusters

Data Mining | Data-intensive Computing Platforms | Directed Acyclic Graph | PVLDB 2010 |

Explore & Download

Productivity Tools

Sciweavers