Learning author-topic models from text corpora

13 years 2 months ago

Download www.ics.uci.edu

We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words. The probability distribution over topics in a multi-author paper is a mixture of the distributions associated with the authors. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology large text corpora: 150,000 abstracts from the CiteSeer digital library, 1,740 papers from the Neural Information Processing Systems Conference (NIPS), and 121,000 emails from a large corporation. We discuss in detail the interpretation of the results discovered by the system including speciﬁc topic and author models, ranking of authors by topic and topi...

Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas L

Real-time Traffic

Large Text | Markov Chain Monte Carlo | Probability Distribution | TOIS 2010 |

claim paper

» Probabilistic authortopic models for information discovery

» Labeled LDA A supervised topic model for credit attribution in multilabeled corpora

» Active Learning for Multilingual Statistical Machine Translation

» Word Order Acquisition from Corpora

» Knowledge discovery through directed probabilistic topic models a survey

» Learning a model of speaker head nods using gesture corpora

» Learning Visual Entities and Their Visual Attributes from Text Corpora

» Hierarchical Orderings of Textual Units

Post Info
More Details (n/a)

Added	31 Jan 2011
Updated	31 Jan 2011
Type	Journal
Year	2010
Where	TOIS
Authors	Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas L. Griffiths, Padhraic Smyth, Mark Steyvers

Comments (0)

Sciweavers

Learning author-topic models from text corpora

Large Text | Markov Chain Monte Carlo | Probability Distribution | TOIS 2010 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers