Motivation: Efficient, accurate and automatic clustering of large protein sequence datasets, such as complete proteomes, into families, according to sequence similarity. Detection...
—The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important, the challenge is to identify subfamilies of evolut...
Abdellali Kelil, Shengrui Wang, Ryszard Brzezinski
Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model ...
Konstantin Voevodski, Maria-Florina Balcan, Heiko ...
Wehave recently described a method based on Artificial Neural Networksto cluster protein sequences into families. The network was trained with Kohonen’s unsupervised-learning al...
Recent studies in protein sequence analysis have leveraged the power of unlabeled data. For example, the profile and mismatch neighborhood kernels have shown significant improveme...