The "Best K" for Entropy-based Categorical Data Clustering

13 years 10 months ago

Download www.cs.wright.edu

With the growing demand on cluster analysis for categorical data, a handful of categorical clustering algorithms have been developed. Surprisingly, to our knowledge, none has satisfactorily addressed the important problem for categorical clustering – how can we determine the best K number of clusters for a categorical dataset? Since the categorical data does not have the inherent distance function as the similarity measure, the traditional cluster validation techniques based on the geometry shape and density distribution cannot be applied to answer this question. In this paper, we investigate the entropy property of the categorical data and propose a BkPlot method for determining a set of candidate “best Ks”. This method is implemented with a hierarchical clustering algorithm HierEntro. The experimental result shows that our approach can effectively identify the signiﬁcant clustering structures. keywords Categorical Data Clustering, Entropy, Cluster Validation

Keke Chen, Ling Liu

Real-time Traffic

Categorical Clustering | Categorical Clustering Algorithms | Categorical Data | Database | SSDBM 2005 |

claim paper

Post Info
More Details (n/a)

Added	25 Jun 2010
Updated	25 Jun 2010
Type	Conference
Year	2005
Where	SSDBM
Authors	Keke Chen, Ling Liu

Comments (0)

Sciweavers

The "Best K" for Entropy-based Categorical Data Clustering

Categorical Clustering | Categorical Clustering Algorithms | Categorical Data | Database | SSDBM 2005 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers