A general model for clustering binary data

9 years 11 months ago
A general model for clustering binary data
Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. This is the case for market basket datasets where the transactions contain items and for document datasets where the documents contain "bag of words". The contribution of the paper is three-fold. First a general binary data clustering model is presented. The model treats the data and features equally, based on their symmetric association relations, and explicitly describes the data assignments as well as feature assignments. We characterize several variations with different optimization procedures for the general model. Second, we also establish the connections between our clustering model with other existing clustering methods. Third, we also discuss the problem for determining the number of clusters for binary clustering. Experimental results sho...
Tao Li
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2005
Where KDD
Authors Tao Li
Comments (0)