Sciweavers

SDM
2011
SIAM

Data Integration via Constrained Clustering: An Application to Enzyme Clustering

12 years 6 months ago
Data Integration via Constrained Clustering: An Application to Enzyme Clustering
When multiple data sources are available for clustering, an a priori data integration process is usually required. This process may be costly and may not lead to good clusterings, since important information is likely to be discarded. In this paper we propose constrained clustering as a strategy for integrating data sources without losing any information. It basically consists of adding the complementary data sources as constraints that the algorithm must satisfy. As a concrete application of our approach, we focus on the problem of enzyme function prediction, which is a hard task usually performed by intensive experimental work. We use constrained clustering as a means of integrating information from diverse sources as constraints, and analyze how this additional information impacts clustering quality in an enzyme clustering application scenario. Our results show that constraints generally improve the clustering quality when compared to an unconstrained clustering algorithm.
Elisa Boari de Lima, Raquel Cardoso de Melo Minard
Added 17 Sep 2011
Updated 17 Sep 2011
Type Journal
Year 2011
Where SDM
Authors Elisa Boari de Lima, Raquel Cardoso de Melo Minardi, Wagner Meira Jr., Mohammed Javeed Zaki
Comments (0)