Sciweavers

BMCBI
2005

Genome comparison without alignment using shortest unique substrings

13 years 4 months ago
Genome comparison without alignment using shortest unique substrings
Background: Sequence comparison by alignment is a fundamental tool of molecular biology. In this paper we show how a number of sequence comparison tasks, including the detection of unique genomic regions, can be accomplished efficiently without an alignment step. Our procedure for nucleotide sequence comparison is based on shortest unique substrings. These are substrings which occur only once within the sequence or set of sequences analysed and which cannot be further reduced in length without losing the property of uniqueness. Such substrings can be detected using generalized suffix trees. Results: We find that the shortest unique substrings in Caenorhabditis elegans, human and mouse are no longer than 11 bp in the autosomes of these organisms. In mouse and human these unique substrings are significantly clustered in upstream regions of known genes. Moreover, the probability of finding such short unique substrings in the genomes of human or mouse by chance is extremely small. We deri...
Bernhard Haubold, Nora Pierstorff, Friedrich M&oum
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2005
Where BMCBI
Authors Bernhard Haubold, Nora Pierstorff, Friedrich Möller, Thomas Wiehe
Comments (0)