PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

14 years 10 months ago

Download aclweb.org

This paper establishes a connection between two apparently very different kinds of probabilistic models. Latent Dirichlet Allocation (LDA) models are used as "topic models" to produce a lowdimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well. Adaptor Grammars (AGs) are a hierarchical, non-parameteric Bayesian extension of PCFGs. Exploiting the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that combine insights from LDA and AG models. The first replaces the unigram component of LDA topic models with multi-word sequences or collocations generated by an AG. The second extension builds on the first one to learn aspects of the internal structure of proper names.

Mark Johnson

Real-time Traffic

ACL 2010 | Computational Linguistics | LDA Topic Models | Probabilistic Models | Topic Models |

claim paper

Post Info
More Details (n/a)

Added	10 Feb 2011
Updated	10 Feb 2011
Type	Journal
Year	2010
Where	ACL
Authors	Mark Johnson

Comments (0)

Sciweavers

PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

ACL 2010 | Computational Linguistics | LDA Topic Models | Probabilistic Models | Topic Models |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers