Sciweavers

CIKM
2004
Springer

Scalable sequential pattern mining for biological sequences

13 years 10 months ago
Scalable sequential pattern mining for biological sequences
Biosequences typically have a small alphabet, a long length, and patterns containing gaps (i.e., “don’t care”) of arbitrary size. Mining frequent patterns in such sequences faces a different type of explosion than in transaction sequences primarily motivated in market-basket analysis. In this paper, we study how this explosion affects the classic sequential pattern mining, and present a scalable two-phase algorithm to deal with this new explosion. The Segment Phase first searches for short patterns containing no gaps, called segments. This phase is efficient. The Pattern Phase searches for long patterns containing multiple segments separated by variable length gaps. This phase is time consuming. The purpose of two phases is to exploit the information obtained from the first phase to speed up the pattern growth and matching and to prune the search space in the second phase. We evaluate this approach on synthetic and real life data sets. Categories and Subject Descriptors: H.2....
Ke Wang, Yabo Xu, Jeffrey Xu Yu
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where CIKM
Authors Ke Wang, Yabo Xu, Jeffrey Xu Yu
Comments (0)