Sciweavers

SPIRE
2009
Springer

Faster Algorithms for Sampling and Counting Biological Sequences

13 years 8 months ago
Faster Algorithms for Sampling and Counting Biological Sequences
Abstract. A set of sequences S is pairwise bounded if the Hamming distance between any pair of sequences in S is at most 2d. The Consensus Sequence problem aims to discern between pairwise bounded sets that have a consensus, and if so, finding one such sequence s∗ , and those that do not. This problem is closely related to the motif-recognition problem, stractly models finding important subsequences in biological data. We give an efficient algorithm for sampling pairwise bounded sets, referred to as MarkovSampling, and show it generates pairwise bounded sets uniformly at random. We illustrate the applicability of MarkovSampling to efficiently solving motif-recognition instances. Computing the expected number of motif sets has been a long-standing open problem in motif-recognition [1, 3]. We consider the related problem of counting the number of pairwise bounded sets, give new bounds on number of pairwise bounded sets, and present an algorithmic approach to counting the number of pa...
Christina Boucher
Added 27 Jul 2010
Updated 27 Jul 2010
Type Conference
Year 2009
Where SPIRE
Authors Christina Boucher
Comments (0)