Sciweavers

KDD
2003
ACM

Generative model-based clustering of directional data

14 years 4 months ago
Generative model-based clustering of directional data
High dimensional directional data is becoming increasingly important in contemporary applications such as analysis of text and gene-expression data. A natural model for multivariate directional data is provided by the von Mises-Fisher (vMF) distribution on the unit hypersphere that is analogous to the multi-variate Gaussian distribution in Rd . In this paper, we propose modeling complex directional data as a mixture of vMF distributions. We derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the parameters of this mixture. We also propose two clustering algorithms corresponding to these variants. An interesting aspect of our methodology is that the spherical kmeans algorithm (kmeans with cosine similarity) can be shown to be a special case of both our algorithms. As part of experimental validation, we present results on clustering high-dimensional text and gene-expression data as a mixture of vMF distributions. The results indicate that our ap...
Arindam Banerjee, Inderjit S. Dhillon, Joydeep Gho
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2003
Where KDD
Authors Arindam Banerjee, Inderjit S. Dhillon, Joydeep Ghosh, Suvrit Sra
Comments (0)