Designing seeds for similarity search in genomic DNA

9 years 11 months ago
Designing seeds for similarity search in genomic DNA
Large-scale comparison of genomic DNA is of fundamental importance in annotating functional elements of genomes. To perform large comparisons efficiently, BLAST (Methods: Companion Methods Enzymol 266 (1996) 460, J. Mol. Biol. 215 (1990) 403, Nucleic Acids Res. 25(17) (1997) 3389) and other widely used tools use seeded alignment, which compares only sequences that can be shown to share a common pattern or "seed'' of matching bases. The literature suggests that the choice of seed substantially affects the sensitivity of seeded alignment, but designing and evaluating seeds is computationally challenging. This work addresses the problem of designing a seed to optimize performance of seeded alignment. We give a fast, simple algorithm based on finite automata for evaluating the sensitivity of a seed in a Markov model of ungapped alignments, along with extensions to mixtures and inhomogeneous Markov models. We give intuition and theoretical results on which seeds are good cho...
Jeremy Buhler, Uri Keich, Yanni Sun
Added 03 Dec 2009
Updated 03 Dec 2009
Type Conference
Year 2003
Authors Jeremy Buhler, Uri Keich, Yanni Sun
Comments (0)