K-means clustering via principal component analysis

16 years 8 months ago

Download ranger.uta.edu

Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. K-means clustering is a commonly used data clustering for unsupervised learning tasks. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. Equivalently, we show that the subspace spanned by the cluster centroids are given by spectral expansion of the data covariance matrix truncated at K - 1 terms. These results indicate that unsupervised dimension reduction is closely related to unsupervised learning. On dimension reduction, the result provides new insights to the observed effectiveness of PCA-based data reductions, beyond the conventional noise-reduction explanation. Mapping data points into a higher dimensional space via kernels, we show that solution for Kernel K-means is given by Kernel PCA. On learning, our results suggest effective techniques for K-means clustering. DNA gene express...

Chris H. Q. Ding, Xiaofeng He

Real-time Traffic

ICML 2004 | Kernel K-means | Machine Learning | Unsupervised Dimension Reduction | Unsupervised Learning Tasks |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2004
Where	ICML
Authors	Chris H. Q. Ding, Xiaofeng He

Comments (0)

Sciweavers

K-means clustering via principal component analysis

ICML 2004 | Kernel K-means | Machine Learning | Unsupervised Dimension Reduction | Unsupervised Learning Tasks |

Explore & Download

Productivity Tools

Sciweavers