We present a multi-dimensional indexing approach for fast sequence similarity search in DNA and protein databases. In particular, we propose effective transformations of subsequen...
Approximate string matching on large DNA sequences data is very important in bioinformatics. Some studies have shown that suffix tree is an efficient data structure for approxim...
Background: High-throughput sequencing makes it possible to rapidly obtain thousands of 16S rDNA sequences from environmental samples. Bioinformatic tools for the analyses of larg...
Yanan Yu, Mya Breitbart, Pat McNairnie, Forest Roh...
A method is described for identification and classification of proteins encoded in large DNA sequences. Previously, an automated system was introduced for the general detection of...
Background: Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious...