Sciweavers

ICML
2010
IEEE

The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling

13 years 5 months ago
The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling
The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric mixed membership model--each data point is modeled with a collection of components of different proportions. Though powerful, the HDP makes an assumption that the probability of a component being exhibited by a data point is positively correlated with its proportion within that data point. This might be an undesirable assumption. For example, in topic modeling, a topic (component) might be rare throughout the corpus but dominant within those documents (data points) where it occurs. We develop the IBP compound Dirichlet process (ICD), a Bayesian nonparametric prior that decouples across-data prevalence and within-data proportion in a mixed membership model. The ICD combines properties from the HDP and the Indian buffet process (IBP), a Bayesian nonparametric prior on binary matrices. The ICD assigns a subset of the shared mixture components to each data point. This subset, the data point's "focus", is d...
Sinead Williamson, Chong Wang, Katherine A. Heller
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2010
Where ICML
Authors Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Comments (0)