Sciweavers

ICDM
2002
IEEE

Using Category-Based Adherence to Cluster Market-Basket Data

13 years 9 months ago
Using Category-Based Adherence to Cluster Market-Basket Data
In this paper, we devise an efficient algorithm for clustering market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality, sparsity, and with massive outliers. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise in this paper a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. The distance of an item to a given cluster is defined as the number of links between this item and its nearest large node in the taxonomy tree where a large node is an item (i.e., leaf) or a category (i.e., internal) node whose occurrence count exceeds a given threshold. The category-based adherence of a transaction to a cluster is then defined as the average di...
Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen
Added 14 Jul 2010
Updated 14 Jul 2010
Type Conference
Year 2002
Where ICDM
Authors Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen
Comments (0)