Sciweavers

ICAI
2004

A Comparison of Resampling Methods for Clustering Ensembles

13 years 5 months ago
A Comparison of Resampling Methods for Clustering Ensembles
-- Combination of multiple clusterings is an important task in the area of unsupervised learning. Inspired by the success of supervised bagging algorithms, we propose a resampling scheme for integration of multiple independent clusterings. Individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the given data set. In this paper, we compare the efficacy of both subsampling (sampling without replacement) and bootstrap (with replacement) techniques in conjunction with several fusion algorithms. The empirical study shows that a meaningful consensus partition for an entire set of data points emerges from multiple clusterings of subsamples of small size. The purpose of this paper is to show that small subsamples generally suffice to represent the structure of the entire data set in the framework of clustering ensembles. Subsamples of small size can reduce computational cost and measurement complexity for many unsupervised data mining ta...
Behrouz Minaei-Bidgoli, Alexander P. Topchy, Willi
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where ICAI
Authors Behrouz Minaei-Bidgoli, Alexander P. Topchy, William F. Punch
Comments (0)