Sciweavers

BMCBI
2010

Protein sequences classification by means of feature extraction with substitution matrices

13 years 4 months ago
Protein sequences classification by means of feature extraction with substitution matrices
Background: This paper deals with the preprocessing of protein sequences for supervised classification. Motif extraction is one way to address that task. It has been largely used to encode biological sequences into feature vectors to enable using well-known machine-learning classifiers which require this format. However, designing a suitable feature space, for a set of proteins, is not a trivial task. For this purpose, we propose a novel encoding method that uses amino-acid substitution matrices to define similarity between motifs during the extraction step. Results: In order to demonstrate the efficiency of such approach, we compare several encoding methods using some machine learning classifiers. The experimental results showed that our encoding method outperforms other ones in terms of classification accuracy and number of generated attributes. We also compared the classifiers in term of accuracy. Results indicated that SVM generally outperforms the other classifiers with any encod...
Rabie Saidi, Mondher Maddouri, Engelbert Mephu Ngu
Added 08 Dec 2010
Updated 08 Dec 2010
Type Journal
Year 2010
Where BMCBI
Authors Rabie Saidi, Mondher Maddouri, Engelbert Mephu Nguifo
Comments (0)