Sciweavers

ICDM
2002
IEEE

O-Cluster: Scalable Clustering of Large High Dimensional Data Sets

13 years 9 months ago
O-Cluster: Scalable Clustering of Large High Dimensional Data Sets
Clustering large data sets of high dimensionality has always been a serious challenge for clustering algorithms. Many recently developed clustering algorithms have attempted to address either handling data sets with very large number of records or data sets with very high number of dimensions. This paper provides a discussion of the advantages and limitations of existing algorithms when they operate on very large multidimensional data sets. To simultaneously overcome both the “curse of dimensionality” and the scalability problems associated with large amounts of data, we propose a new clustering algorithm called O-Cluster. This new clustering method combines a novel active sampling technique with an axis-parallel partitioning strategy to identify continuous areas of high density in the input space. The method operates on a limited memory buffer and requires at most a single scan through the data. We demonstrate the high quality of the obtained clustering solutions, their robustnes...
Boriana L. Milenova, Marcos M. Campos
Added 14 Jul 2010
Updated 14 Jul 2010
Type Conference
Year 2002
Where ICDM
Authors Boriana L. Milenova, Marcos M. Campos
Comments (0)