Sciweavers

DMIN
2008

PCS: An Efficient Clustering Method for High-Dimensional Data

13 years 5 months ago
PCS: An Efficient Clustering Method for High-Dimensional Data
Clustering algorithms play an important role in data analysis and information retrieval. How to obtain a clustering for a large set of highdimensional data suitable for database applications remains a challenge. We devise in this paper a set-theoretic clustering method called PCS (Pairwise Consensus Scheme) for high-dimensional data. Given a large set of d-dimensional data, PCS first constructs ( d p ) clusterings, where p d is a small number (e.g., p = 2 or p = 3) and each clustering is constructed on data projected to a combination of p selected dimensions using an existing p-dimensional clustering algorithm. PCS then constructs, using a greedy pairwise comparison technique based on a recent clustering algorithm [1], a near-optimal consensus clustering from these projected clusterings to be the final clustering of the original data set. We show that PCS incurs only a moderate I/O cost, and the memory requirement is independent of the data size. Finally, we carry out numerical experi...
Wei Li 0011, Cindy Chen, Jie Wang
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where DMIN
Authors Wei Li 0011, Cindy Chen, Jie Wang
Comments (0)