Data structures and compression algorithms for high-throughput sequencing technologies

13 years 4 months ago

Download www.biomedcentral.com

Background: High-throughput sequencing (HTS) technologies play important roles in the life sciences by allowing the rapid parallel sequencing of very large numbers of relatively short nucleotide sequences, in applications ranging from genome sequencing and resequencing to digital microarrays and ChIP-Seq experiments. As experiments scale up, HTS technologies create new bioinformatics challenges for the storage and sharing of HTS data. Results: We develop data structures and compression algorithms for HTS data. A processing stage maps short sequences to a reference genome or a large table of sequences. Then the integers representing the short sequence absolute or relative addresses, their length, and the substitutions they may contain are compressed and stored using various entropy coding algorithms, including both old and new fixed codes (e.g Golomb, Elias Gamma, MOV) and variable codes (e.g. Huffman). The general methodology is illustrated and applied to several HTS data sets. Result...

Kenny Daily, Paul Rigor, Scott Christley, Xiaohui

Real-time Traffic

BMCBI 2010 | General Purpose Compression | HTS Data | Purpose Compression Programs |

claim paper

» HighThroughput Data Compressor Designs Using Content Addressable Memory

» HighThroughput 3D Structural Homology Detection via NMR Resonance Assignment

» Assessment of algorithms for high throughput detection of genomic copy number variation in...

» Opportunistic Data Structures for Range Queries

» Persistent Indexing Technology for Large Sequences

» Phylogenetic Comparative Assembly

» RNACompress Grammarbased compression and informational complexity measurement of RNA secon...

» baySeq Empirical Bayesian methods for identifying differential expression in sequence coun...

Post Info
More Details (n/a)

Added	09 Dec 2010
Updated	09 Dec 2010
Type	Journal
Year	2010
Where	BMCBI
Authors	Kenny Daily, Paul Rigor, Scott Christley, Xiaohui Xie, Pierre Baldi

Comments (0)

Sciweavers

Data structures and compression algorithms for high-throughput sequencing technologies

BMCBI 2010 | General Purpose Compression | HTS Data | Purpose Compression Programs |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers