Web-scale k-means clustering

11 years 6 months ago
Web-scale k-means clustering
We present two modifications to the popular k-means clustering algorithm to address the extreme requirements for latency, scalability, and sparsity encountered in user-facing web applications. First, we propose the use of mini-batch optimization for k-means clustering. This reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent. Second, we achieve sparsity with projected gradient descent, and give a fast ǫaccurate projection onto the L1-ball. Source code is freely available: Categories and Subject Descriptors I.5.3 [Computing Methodologies]: Pattern Recognition— Clustering General Terms Algorithms, Performance, Experimentation Keywords unsupervised clustering, scalability, sparse solutions
D. Sculley
Added 14 May 2010
Updated 14 May 2010
Type Conference
Year 2010
Where WWW
Authors D. Sculley
Comments (0)