A sampling-based framework for parallel data mining

15 years 8 months ago

Download www.cs.uiuc.edu

The goal of data mining algorithm is to discover useful information embedded in large databases. Frequent itemset mining and sequential pattern mining are two important data mining problems with broad applications. Perhaps the most eﬃcient way to solve these problems sequentially is to apply a pattern-growth algorithm, which is a divide-and-conquer algorithm [9, 10]. In this paper, we present a framework for parallel mining frequent itemsets and sequential patterns based on the divide-and-conquer strategy of pattern growth. Then, we discuss the load balancing problem and introduce a sampling technique, called selective sampling, to address this problem. We implemented parallel versions of both frequent itemsets and sequential pattern mining algorithms following our framework. The experimental results show that our parallel algorithms usually achieve excellent speedups. Categories and Subject Descriptors D.1 [Programming Techniques]: Concurrent programming—parallel programming; H.2...

Shengnan Cong, Jiawei Han, Jay Hoeflinger, David A

Real-time Traffic

Data Mining | Distributed And Parallel Computing | Mining | PPOPP 2005 | Sequential Pattern Mining |

claim paper

» PerfExplorer A Performance Data Mining Framework For LargeScale Parallel Computing

» Performance Issues in Parallelizing DataIntensive Applications on a Multicore Cluster

» Pervasive parallelism in data mining dataflow solution to coclustering large and sparse Ne...

» Data Cube Materialization and Mining over MapReduce

» Troubleshooting Distributed Systems via Data Mining

» A Distributed Kernel Summation Framework for GeneralDimension Machine Learning

» SIPping from the Data Firehose

» A Parallel Scalable Infrastructure for OLAP and Data Mining

Post Info
More Details (n/a)

Added	26 Jun 2010
Updated	26 Jun 2010
Type	Conference
Year	2005
Where	PPOPP
Authors	Shengnan Cong, Jiawei Han, Jay Hoeflinger, David A. Padua

Comments (0)

Sciweavers

A sampling-based framework for parallel data mining

Data Mining | Distributed And Parallel Computing | Mining | PPOPP 2005 | Sequential Pattern Mining |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers