Sciweavers

SDM
2003
SIAM

Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data

13 years 5 months ago
Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data
The problem of finding clusters in data is challenging when clusters are of widely differing sizes, densities and shapes, and when the data contains large amounts of noise and outliers. Many of these issues become even more significant when the data is of very high dimensionality, such as text or time series data. In this paper we present a novel clustering technique that addresses these issues. Our algorithm first finds the nearest neighbors of each data point and then redefines the similarity between pairs of points in terms of how many nearest neighbors the two points share. Using this new definition of similarity, we eliminate noise and outliers, identify core points, and then build clusters around the core points. The use of a shared nearest neighbor definition of similarity removes problems with varying density, while the use of core points handles problems with shape and size. We experimentally show that our algorithm performs better than traditional methods (e.g., K-means) on ...
Levent Ertöz, Michael Steinbach, Vipin Kumar
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 2003
Where SDM
Authors Levent Ertöz, Michael Steinbach, Vipin Kumar
Comments (0)