Sciweavers

CIKM
2005
Springer

Exact match search in sequence data using suffix trees

13 years 10 months ago
Exact match search in sequence data using suffix trees
We study suitable indexing techniques to support efficient exact match search in large biological sequence databases. We propose a suffix tree (ST) representation, called STA-DF, as an alternative to the array representation of ST (STA) proposed in [7] and utilized in [18]. To study the performance of STA and STA-DF, we develop a memory efficient ST-based Exact Match (STEM) search algorithm. We implemented STEM and both representations of ST and conducted extensive experiments. Our results indicate that the STA and STA-DF representations are very similar in construction time, storage utilization, and search time using STEM. In terms of the access patterns by STEM, our results show that compared to STA, the STA-DF representation exhibits better spatial and sequential locality of reference. This suggests that STA-DF would require less number of disk I/Os, and hence is more amenable to efficient and scalable disk-based computation. Categories and Subject Descriptors E.1 [Data Structures]...
Mihail Halachev, Nematollaah Shiri, Anand Thamildu
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where CIKM
Authors Mihail Halachev, Nematollaah Shiri, Anand Thamildurai
Comments (0)