Scalable Clustering for Large High-Dimensional Data Based on Data Summarization

15 years 11 months ago

Download uisacad2.uis.edu

Clustering large data sets with high dimensionality is a challenging data-mining task. This paper presents a framework to perform such a task efficiently. It is based on the notion of data space reduction, which finds high density areas, or dense cells, in the given feature space. The dense cells store summarized information of the data. A designated partitioning or hierarchical clustering algorithm can be used as the second step to find clusters based on the data summaries. Using Kmeans as an example, this paper presents GARDEN-Kmeans, which performs data space reduction using Gamma Region DENsity partition, and utilizes Kmeans to cluster the summarized information. The experimental study shows that GARDEN-Kmeans executes several orders of magnitude faster than basic Kmeans and the recursive bisection Kmeans algorithm of CLUTO, while producing comparable clustering quality.

Ying Lai, Ratko Orlandic, Wai Gen Yee, Sachin Kulk

Real-time Traffic

Artificial Intelligence | CIDM 2007 | Data Space Reduction | Dense Cells | Hierarchical Clustering Algorithm |

claim paper

» DensEst Density Estimation for Data Mining in High Dimensional Spaces

» Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications

» HDEye Visual Clustering of High dimensional Data

» PCS An Efficient Clustering Method for HighDimensional Data

» Algorithms for BoundedError Correlation of High Dimensional Data in Microarray Experiments

» CARE Finding Local Linear Correlations in High Dimensional Data

» Finding Generalized Projected Clusters In High Dimensional Spaces

» Selectivity Estimation of High Dimensional Window Queries via Clustering

Post Info
More Details (n/a)

Added	02 Jun 2010
Updated	02 Jun 2010
Type	Conference
Year	2007
Where	CIDM
Authors	Ying Lai, Ratko Orlandic, Wai Gen Yee, Sachin Kulkarni

Comments (0)

Sciweavers

Scalable Clustering for Large High-Dimensional Data Based on Data Summarization

Artificial Intelligence | CIDM 2007 | Data Space Reduction | Dense Cells | Hierarchical Clustering Algorithm |

Explore & Download

Productivity Tools

Sciweavers