Sciweavers

NIPS
2008

On the Reliability of Clustering Stability in the Large Sample Regime

13 years 5 months ago
On the Reliability of Clustering Stability in the Large Sample Regime
Clustering stability is an increasingly popular family of methods for performing model selection in data clustering. The basic idea is that the chosen model should be stable under perturbation or resampling of the data. Despite being reasonably effective in practice, these methods are not well understood theoretically, and present some difficulties. In particular, when the data is assumed to be sampled from an underlying distribution, the solutions returned by the clustering algorithm will usually become more and more stable as the sample size increases. This raises a potentially serious practical difficulty with these methods, because it means there might be some hard-to-compute sample size, beyond which clustering stability estimators 'break down' and become unreliable in detecting the most stable model. In this paper, we provide a set of general sufficient conditions, which ensure the reliability of clustering stability estimators in the large sample regime. In contrast t...
Ohad Shamir, Naftali Tishby
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where NIPS
Authors Ohad Shamir, Naftali Tishby
Comments (0)