Sciweavers

KAIS
2006

Fast and exact out-of-core and distributed k-means clustering

13 years 3 months ago
Fast and exact out-of-core and distributed k-means clustering
Clustering has been one of the most widely studied topics in data mining and k-means clustering has been one of the popular clustering algorithms. K-means requires several passes on the entire dataset, which can make it very expensive for large disk-resident datasets. In view of this, a lot of work has been done on various approximate versions of k-means, which require only one or a small number of passes on the entire dataset. In this paper, we present a new algorithm, called Fast and Exact K-means Clustering (FEKM), which typically requires only one or a small number of passes on the entire dataset, and provably produces the same cluster centers as reported by the original k-means algorithm. The algorithm uses sampling to create initial cluster centers, and then takes one or more passes over the entire dataset to adjust these cluster centers. We provide theoretical analysis to show that the cluster centers thus reported are the same as the ones computed by the original k-means algor...
Ruoming Jin, Anjan Goswami, Gagan Agrawal
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2006
Where KAIS
Authors Ruoming Jin, Anjan Goswami, Gagan Agrawal
Comments (0)