Combining partitions by probabilistic label aggregation

11 years 5 months ago
Combining partitions by probabilistic label aggregation
Data clustering represents an important tool in exploratory data analysis. The lack of objective criteria render model selection as well as the identification of robust solutions particularly difficult. The use of a stability assessment and the combination of multiple clustering solutions represents an important ingredient to achieve the goal of finding useful partitions. In this work, we propose a novel way of combining multiple clustering solutions for both, hard and soft partitions: the approach is based on modeling the probability that two objects are grouped together. An efficient EM optimization strategy is employed in order to estimate the model parameters. Our proposal can also be extended in order to emphasize the signal more strongly by weighting individual base clustering solutions according to their consistency with the prediction for previously unseen objects. In addition to that, the probabilistic model supports an outof-sample extension that (i) makes it possible to a...
Tilman Lange, Joachim M. Buhmann
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where KDD
Authors Tilman Lange, Joachim M. Buhmann
Comments (0)