Sciweavers

BMCBI
2004

A hybrid clustering approach to recognition of protein families in 114 microbial genomes

13 years 4 months ago
A hybrid clustering approach to recognition of protein families in 114 microbial genomes
Background: Grouping proteins into sequence-based clusters is a fundamental step in many bioinformatic analyses (e.g., homology-based prediction of structure or function). Standard clustering methods such as single-linkage clustering capture a history of cluster topologies as a function of threshold, but in practice their usefulness is limited because unrelated sequences join clusters before biologically meaningful families are fully constituted, e.g. as the result of matches to so-called promiscuous domains. Use of the Markov Cluster algorithm avoids this non-specificity, but does not preserve topological or threshold information about protein families. Results: We describe a hybrid approach to sequence-based clustering of proteins that combines the advantages of standard and Markov clustering. We have implemented this hybrid approach over a relational database environment, and describe its application to clustering a large subset of PDB, and to 328577 proteins from 114 fully sequenc...
Timothy J. Harlow, J. Peter Gogarten, Mark A. Raga
Added 16 Dec 2010
Updated 16 Dec 2010
Type Journal
Year 2004
Where BMCBI
Authors Timothy J. Harlow, J. Peter Gogarten, Mark A. Ragan
Comments (0)