K-means clustering via principal component analysis

11 years 6 months ago
K-means clustering via principal component analysis
Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. K-means clustering is a commonly used data clustering for unsupervised learning tasks. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. Equivalently, we show that the subspace spanned by the cluster centroids are given by spectral expansion of the data covariance matrix truncated at K - 1 terms. These results indicate that unsupervised dimension reduction is closely related to unsupervised learning. On dimension reduction, the result provides new insights to the observed effectiveness of PCA-based data reductions, beyond the conventional noise-reduction explanation. Mapping data points into a higher dimensional space via kernels, we show that solution for Kernel K-means is given by Kernel PCA. On learning, our results suggest effective techniques for K-means clustering. DNA gene express...
Chris H. Q. Ding, Xiaofeng He
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2004
Where ICML
Authors Chris H. Q. Ding, Xiaofeng He
Comments (0)