Fast Motif Search in Protein Sequence Databases

10 years 2 months ago
Fast Motif Search in Protein Sequence Databases
Regular expression pattern matching is widely used in computational biology. Searching through a database of sequences for a motif (a simple regular expression), or its variations is an important interactive process which requires fast motif-matching algorithms. In this paper, we explore and evaluate various representations of the database of sequences using suffix trees for two types of query problems for a given regular expression: 1) Find the first match, and 2) Find all matches. Answering Problem 1 increases the level and effectiveness of interactive motif exploration. We propose a framework in which Problem 1 can be solved in a faster manner than existing solutions while not slowing down the solution of Problem 2. We apply several heuristics both at the level of suffix tree creation resulting in modified tree representations, and at the regular expression matching level in which we search subtrees in a given predefined order by simulating a deterministic finite automaton that we ...
Elena Zheleva, Abdullah N. Arslan
Added 22 Aug 2010
Updated 22 Aug 2010
Type Conference
Year 2006
Where CSR
Authors Elena Zheleva, Abdullah N. Arslan
Comments (0)