REBMEC: Repeat Based Maximum Entropy Classifier for Biological Sequences

15 years 7 months ago

Download www.cse.iitb.ac.in

An important problem in biological data analysis is to predict the family of a newly discovered sequence like a protein or DNA sequence, using the collection of available sequences. In this paper we tackle this problem and present REBMEC, a Repeat Based Maximum Entropy Classifier of biological sequences. Maximum entropy models are known to be theoretically robust and yield high accuracy, but are slow. This makes them useful as benchmarks to evaluate other classifiers. Specifically, REBMEC is based on the classical Generalized Iterative Scaling (GIS) algorithm and incorporates repeated occurrences of subsequences within each sequence. REBMEC uses maximal frequent subsequences as features but can support other types of features as well. Our extensive experiments on two collections of protein families show that REBMEC performs as well as existing state-of-the-art probabilistic classifiers for biological sequences without using domainspecific background knowledge such as multiple alignmen...

Pratibha Rani, Vikram Pudi

Real-time Traffic

Biological Sequences | COMAD 2008 | Knowledge Management | Maximum Entropy | Maximum Entropy Classifier |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	COMAD
Authors	Pratibha Rani, Vikram Pudi

Comments (0)

Sciweavers

REBMEC: Repeat Based Maximum Entropy Classifier for Biological Sequences

Biological Sequences | COMAD 2008 | Knowledge Management | Maximum Entropy | Maximum Entropy Classifier |

Explore & Download

Productivity Tools

Sciweavers