Sciweavers

ECML
2006
Springer

Subspace Metric Ensembles for Semi-supervised Clustering of High Dimensional Data

13 years 8 months ago
Subspace Metric Ensembles for Semi-supervised Clustering of High Dimensional Data
A critical problem in clustering research is the definition of a proper metric to measure distances between points. Semi-supervised clustering uses the information provided by the user, usually defined in terms of constraints, to guide the search of clusters. Learning effective metrics using constraints in high dimensional spaces remains an open challenge. This is because the number of parameters to be estimated is quadratic in the number of dimensions, and we seldom have enough sideinformation to achieve accurate estimates. In this paper, we address the high dimensionality problem by learning an ensemble of subspace metrics. This is achieved by projecting the data and the constraints in multiple subspaces, and by learning positive semi-definite similarity matrices therein. This methodology allows leveraging the given side-information while solving lower dimensional problems. We demonstrate experimentally using high dimensional data (e.g., microarray data) the superior accuracy achieve...
Bojun Yan, Carlotta Domeniconi
Added 22 Aug 2010
Updated 22 Aug 2010
Type Conference
Year 2006
Where ECML
Authors Bojun Yan, Carlotta Domeniconi
Comments (0)