Storage and Retrieval of Individual Genomes

11 years 7 months ago
Storage and Retrieval of Individual Genomes
A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. Flexible and efficient data analysis on a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log N) bits, which very soon inhibits in-memory analyses. Recent advances in full-text selfindexing reduce the space of suffix tree to O(N log ) bits, where is the alphabet size. In practice, the space reduction is more than 10-fold, for example on suffix tree of Human Genome. However, this reduction factor remains constant when more sequences are added to the collection. We develop a new family of self-indexes suited for the repetitive sequence collection setting. Their expected space requirement depends only on th...
Gonzalo Navarro, Jouni Sirén, Niko Väl
Added 23 Nov 2009
Updated 23 Nov 2009
Type Conference
Year 2009
Authors Gonzalo Navarro, Jouni Sirén, Niko Välimäki, Veli Mäkinen
Comments (0)