Sciweavers

COMAD
2008

REBMEC: Repeat Based Maximum Entropy Classifier for Biological Sequences

13 years 6 months ago
REBMEC: Repeat Based Maximum Entropy Classifier for Biological Sequences
An important problem in biological data analysis is to predict the family of a newly discovered sequence like a protein or DNA sequence, using the collection of available sequences. In this paper we tackle this problem and present REBMEC, a Repeat Based Maximum Entropy Classifier of biological sequences. Maximum entropy models are known to be theoretically robust and yield high accuracy, but are slow. This makes them useful as benchmarks to evaluate other classifiers. Specifically, REBMEC is based on the classical Generalized Iterative Scaling (GIS) algorithm and incorporates repeated occurrences of subsequences within each sequence. REBMEC uses maximal frequent subsequences as features but can support other types of features as well. Our extensive experiments on two collections of protein families show that REBMEC performs as well as existing state-of-the-art probabilistic classifiers for biological sequences without using domainspecific background knowledge such as multiple alignmen...
Pratibha Rani, Vikram Pudi
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where COMAD
Authors Pratibha Rani, Vikram Pudi
Comments (0)