Sciweavers

SDM
2008
SIAM

Efficient Distribution Mining and Classification

13 years 5 months ago
Efficient Distribution Mining and Classification
We define and solve the problem of "distribution classification", and, in general, "distribution mining". Given n distributions (i.e., clouds) of multi-dimensional points, we want to classify them into k classes, to find patterns, rules and out-lier clouds. For example, consider the 2-d case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution/cloud of 2-d points (one for each item he bought). We want to group similar users together, e.g., for market segmentation, anomaly/fraud detection. We propose D-Mine to achieve this goal. Our main contribution is Theorem 3.1, which shows how to use wavelets to speed up the cloud-similarity computations. Extensive experiments on both synthetic and real multidimensional data sets show that our method achieves up to 400 faster wall-clock time over the naive implementation, with comparable (and occasionally better) classification quality.
Yasushi Sakurai, Rosalynn Chong, Lei Li, Christos
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2008
Where SDM
Authors Yasushi Sakurai, Rosalynn Chong, Lei Li, Christos Faloutsos
Comments (0)