Classification of protein sequences by means of irredundant patterns

10 years 1 months ago
Classification of protein sequences by means of irredundant patterns
Background: The classification of protein sequences using string algorithms provides valuable insights for protein function prediction. Several methods, based on a variety of different patterns, have been previously proposed. Almost all string-based approaches discover patterns that are not "independent, " and therefore the associated scores overcount, a multiple number of times, the contribution of patterns that cover the same region of a sequence. Results: In this paper we use a class of patterns, called irredundant, that is specifically designed to address this issue. Loosely speaking the set of irredundant patterns is the smallest class of "independent" patterns that can describe all common patterns in two sequences, thus they avoid overcounting. We present a novel discriminative method, called Irredundant Class, based on the statistics of irredundant patterns combined with the power of support vector machines. Conclusion: Tests on benchmark data show that Irre...
Matteo Comin, Davide Verzotto
Added 08 Dec 2010
Updated 08 Dec 2010
Type Journal
Year 2010
Authors Matteo Comin, Davide Verzotto
Comments (0)