Out-of-core frequent pattern mining on a commodity PC

14 years 4 months ago

Download www.cse.ohio-state.edu

In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset mining algorithms and their drawbacks, we introduce our efficient, highly scalable solution. Presented in the context of the FPGrowth algorithm, our technique involves several novel I/O-conscious optimizations, such as approximate hash-based sorting and blocking, and leverages recent architectural advancements in commodity computers, such as 64-bit processing. We evaluate the proposed optimizations on truly large data sets, up to 75GB, and show they yield greater than a 400-fold execution time improvement. Finally, we discuss the impact of this research in the context of other pattern mining challenges, such as sequence mining and graph mining. Categories and Subject Descriptors: H.2.8 [Database Management]: Database Applications - Data Mining; General Terms: Algorithms,Performance.

Gregory Buehrer, Srinivasan Parthasarathy, Amol Gh

Real-time Traffic

Data Mining | Frequent Itemset Mining | KDD 2006 | Out-of-core Data Sets | Pattern Mining Challenges |

claim paper

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2006
Where	KDD
Authors	Gregory Buehrer, Srinivasan Parthasarathy, Amol Ghoting

Comments (0)

Sciweavers

Out-of-core frequent pattern mining on a commodity PC

Data Mining | Frequent Itemset Mining | KDD 2006 | Out-of-core Data Sets | Pattern Mining Challenges |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers