Sciweavers

IPPS
2010
IEEE

Large-scale multi-dimensional document clustering on GPU clusters

13 years 2 months ago
Large-scale multi-dimensional document clustering on GPU clusters
Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial state. One limitation of this approach is that the algorithmic complexity is inherently quadratic in the number of documents. As a result, execution time becomes a bottleneck with large number of documents. In this paper, we assess the benefits of exploiting the computational power of Beowulf-like clusters equipped with contemporary Graphics Processing Units (GPUs) as a means to significantly reduce the runtime of flocking-based document clustering. Our framework scales up to over one million documents processed simultaneously in a sixteen-node moderate GPU cluster. Results are also compared to a four-node cluster with...
Yongpeng Zhang, Frank Mueller, Xiaohui Cui, Thomas
Added 13 Feb 2011
Updated 13 Feb 2011
Type Journal
Year 2010
Where IPPS
Authors Yongpeng Zhang, Frank Mueller, Xiaohui Cui, Thomas E. Potok
Comments (0)