Sciweavers

CIKM
2009
Springer

Suffix trees for very large genomic sequences

13 years 8 months ago
Suffix trees for very large genomic sequences
A suffix tree is a fundamental data structure for string searching algorithms. Unfortunately, when it comes to the use of suffix trees in real-life applications, the current methods for constructing suffix trees do not scale for large inputs. All the existing practical algorithms perform random access to the input string, thus requiring that the input be small enough to be kept in main memory. We are the first to present an algorithm which is able to construct suffix trees for input sequences significantly larger than the size of the available main memory. As a proof of concept, we show that our method allows to build the suffix tree for 12GB of real DNA sequences in 26 hours on a single machine with 2GB of RAM. This input is four times the size of the Human Genome, and the construction of suffix trees for inputs of such magnitude was never reported before. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing General Terms Algorit...
Marina Barsky, Ulrike Stege, Alex Thomo, Chris Upt
Added 14 Aug 2010
Updated 14 Aug 2010
Type Conference
Year 2009
Where CIKM
Authors Marina Barsky, Ulrike Stege, Alex Thomo, Chris Upton
Comments (0)