Sciweavers

BMCBI
2010

Data structures and compression algorithms for high-throughput sequencing technologies

13 years 4 months ago
Data structures and compression algorithms for high-throughput sequencing technologies
Background: High-throughput sequencing (HTS) technologies play important roles in the life sciences by allowing the rapid parallel sequencing of very large numbers of relatively short nucleotide sequences, in applications ranging from genome sequencing and resequencing to digital microarrays and ChIP-Seq experiments. As experiments scale up, HTS technologies create new bioinformatics challenges for the storage and sharing of HTS data. Results: We develop data structures and compression algorithms for HTS data. A processing stage maps short sequences to a reference genome or a large table of sequences. Then the integers representing the short sequence absolute or relative addresses, their length, and the substitutions they may contain are compressed and stored using various entropy coding algorithms, including both old and new fixed codes (e.g Golomb, Elias Gamma, MOV) and variable codes (e.g. Huffman). The general methodology is illustrated and applied to several HTS data sets. Result...
Kenny Daily, Paul Rigor, Scott Christley, Xiaohui
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2010
Where BMCBI
Authors Kenny Daily, Paul Rigor, Scott Christley, Xiaohui Xie, Pierre Baldi
Comments (0)