Sciweavers

KDD
2006
ACM

Out-of-core frequent pattern mining on a commodity PC

14 years 4 months ago
Out-of-core frequent pattern mining on a commodity PC
In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset mining algorithms and their drawbacks, we introduce our efficient, highly scalable solution. Presented in the context of the FPGrowth algorithm, our technique involves several novel I/O-conscious optimizations, such as approximate hash-based sorting and blocking, and leverages recent architectural advancements in commodity computers, such as 64-bit processing. We evaluate the proposed optimizations on truly large data sets, up to 75GB, and show they yield greater than a 400-fold execution time improvement. Finally, we discuss the impact of this research in the context of other pattern mining challenges, such as sequence mining and graph mining. Categories and Subject Descriptors: H.2.8 [Database Management]: Database Applications - Data Mining; General Terms: Algorithms,Performance.
Gregory Buehrer, Srinivasan Parthasarathy, Amol Gh
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2006
Where KDD
Authors Gregory Buehrer, Srinivasan Parthasarathy, Amol Ghoting
Comments (0)