Sciweavers

NIPS
2004

A Probabilistic Model for Online Document Clustering with Application to Novelty Detection

13 years 6 months ago
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection
In this paper we propose a probabilistic model for online document clustering. We use non-parametric Dirichlet process prior to model the growing number of clusters, and use a prior of general English language model as the base distribution to handle the generation of novel clusters. Furthermore, cluster uncertainty is modeled with a Bayesian Dirichletmultinomial distribution. We use empirical Bayes method to estimate hyperparameters based on a historical dataset. Our probabilistic model is applied to the novelty detection task in Topic Detection and Tracking (TDT) and compared with existing approaches in the literature.
Jian Zhang 0003, Zoubin Ghahramani, Yiming Yang
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2004
Where NIPS
Authors Jian Zhang 0003, Zoubin Ghahramani, Yiming Yang
Comments (0)