Sciweavers

NIPS
2004

Discrete profile alignment via constrained information bottleneck

13 years 5 months ago
Discrete profile alignment via constrained information bottleneck
Amino acid profiles, which capture position-specific mutation probabilities, are a richer encoding of biological sequences than the individual sequences themselves. However, profile comparisons are much more computationally expensive than discrete symbol comparisons, making profiles impractical for many large datasets. Furthermore, because they are such a rich representation, profiles can be difficult to visualize. To overcome these problems, we propose a discretization for profiles using an expanded alphabet representing not just individual amino acids, but common profiles. By using an extension of information bottleneck (IB) incorporating constraints and priors on the class distributions, we find an informationally optimal alphabet. This discretization yields a concise, informative textual representation for profile sequences. Also alignments between these sequences, while nearly as accurate as the full profileprofile alignments, can be computed almost as quickly as those between in...
Sean O'Rourke, Gal Chechik, Robin Friedman, Eleaza
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where NIPS
Authors Sean O'Rourke, Gal Chechik, Robin Friedman, Eleazar Eskin
Comments (0)