Mining Compressing Sequential Patterns

10 years 4 months ago
Mining Compressing Sequential Patterns
Compression based pattern mining has been successfully applied to many data mining tasks. We propose an approach based on the minimum description length principle to extract sequential patterns that compress a database of sequences well. We show that mining compressing patterns is NP-Hard and belongs to the class of inapproximable problems. We propose two heuristic algorithms to mining compressing patterns. The first uses a two-phase approach similar to Krimp for itemset data. To overcome performance with the required candidate generation we propose GoKrimp, an effective greedy algorithm that directly mines compressing patterns. We conduct an empirical study on six real-life datasets to compare the proposed algorithms by run time, compressibility, and classification accuracy using the patterns found as features for SVM classifiers.
Hoang Thanh Lam, Fabian Moerchen, Dmitriy Fradkin,
Added 29 Sep 2012
Updated 29 Sep 2012
Type Journal
Year 2012
Where SDM
Authors Hoang Thanh Lam, Fabian Moerchen, Dmitriy Fradkin, Toon Calders
Comments (0)