Sciweavers

579 search results - page 1 / 116
» Modeling word burstiness using the Dirichlet distribution
Sort
View
ICML
2005
IEEE
14 years 5 months ago
Modeling word burstiness using the Dirichlet distribution
Multinomial distributions are often used to model text documents. However, they do not capture well the phenomenon that words in a document tend to appear in bursts: if a word app...
Rasmus Elsborg Madsen, David Kauchak, Charles Elka...
ICML
2006
IEEE
14 years 5 months ago
Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution
The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact ...
Charles Elkan
KDD
2010
ACM
435views Data Mining» more  KDD 2010»
13 years 8 months ago
Topic models with power-law using Pitman-Yor process
One of the important approaches for Knowledge discovery and Data mining is to estimate unobserved variables because latent variables can indicate hidden and specific properties o...
Issei Sato, Hiroshi Nakagawa
ICML
2009
IEEE
14 years 5 months ago
Incorporating domain knowledge into topic modeling via Dirichlet Forest priors
Users of topic modeling methods often have knowledge about the composition of words that should have high or low probability in various topics. We incorporate such domain knowledg...
David Andrzejewski, Xiaojin Zhu, Mark Craven
ICASSP
2011
IEEE
12 years 8 months ago
Unsupervised determination of efficient Korean LVCSR units using a Bayesian Dirichlet process model
Korean is an agglutinative language that does not have explicit word boundaries. It is also a highly inflective language that exhibits severe coarticulation effects. These charac...
Sakriani Sakti, Andrew M. Finch, Ryosuke Isotani, ...