Fast vertical mining using diffsets

10 years 6 months ago
Fast vertical mining using diffsets
A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability. In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly. Categories and Subject Descriptors H.2.8 [Database Management]: Data Mining ...
Mohammed Javeed Zaki, Karam Gouda
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2003
Where KDD
Authors Mohammed Javeed Zaki, Karam Gouda
Comments (0)