Sciweavers

BMCBI
2006

EVEREST: automatic identification and classification of protein domains in all protein sequences

13 years 3 months ago
EVEREST: automatic identification and classification of protein domains in all protein sequences
Background: Proteins are comprised of one or several building blocks, known as domains. Such domains can be classified into families according to their evolutionary origin. Whereas sequencing technologies have advanced immensely in recent years, there are no matching computational methodologies for large-scale determination of protein domains and their boundaries. We provide and rigorously evaluate a novel set of domain families that is automatically generated from sequence data. Our domain family identification process, called EVEREST (EVolutionary Ensembles of REcurrent SegmenTs), begins by constructing a library of protein segments that emerge in an all vs. all pairwise sequence comparison. It then proceeds to cluster these segments into putative domain families. The selection of the best putative families is done using machine learning techniques. A statistical model is then created for each of the chosen families. This procedure is then iterated: the aforementioned statistical mo...
Elon Portugaly, Amir Harel, Nathan Linial, Michal
Added 10 Dec 2010
Updated 10 Dec 2010
Type Journal
Year 2006
Where BMCBI
Authors Elon Portugaly, Amir Harel, Nathan Linial, Michal Linial
Comments (0)