Sciweavers

TOIS
2010

Learning author-topic models from text corpora

13 years 2 months ago
Learning author-topic models from text corpora
We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words. The probability distribution over topics in a multi-author paper is a mixture of the distributions associated with the authors. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology large text corpora: 150,000 abstracts from the CiteSeer digital library, 1,740 papers from the Neural Information Processing Systems Conference (NIPS), and 121,000 emails from a large corporation. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topi...
Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas L
Added 31 Jan 2011
Updated 31 Jan 2011
Type Journal
Year 2010
Where TOIS
Authors Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas L. Griffiths, Padhraic Smyth, Mark Steyvers
Comments (0)