Sciweavers

ICDM
2002
IEEE

Iterative Clustering of High Dimensional Text Data Augmented by Local Search

13 years 9 months ago
Iterative Clustering of High Dimensional Text Data Augmented by Local Search
The k-means algorithm with cosine similarity, also known as the spherical k-means algorithm, is a popular method for clustering document collections. However, spherical k-means can often yield qualitatively poor results, especially when cluster sizes are small, say 25-30 documents per cluster, where it tends to get stuck at a local maximum far away from the optimal solution. In this paper, we present a local search procedure, which we call “first-variation” that refines a given clustering by incrementally moving data points between clusters, thus achieving a higher objective function value. An enhancement of first variation allows a chain of such moves in a KernighanLin fashion and leads to a better local maximum. Combining the enhanced first-variation with spherical k-means yields a powerful “ping-pong” strategy that often qualitatively improves k-means clustering and is computationally efficient. We present several experimental results to highlight the improvement achie...
Inderjit S. Dhillon, Yuqiang Guan, J. Kogan
Added 14 Jul 2010
Updated 14 Jul 2010
Type Conference
Year 2002
Where ICDM
Authors Inderjit S. Dhillon, Yuqiang Guan, J. Kogan
Comments (0)