Sciweavers

EMNLP
2009

Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora

13 years 2 months ago
Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora
A significant portion of the world's text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal specificity across the whole document. Solving the credit attribution problem requires associating each word in a document with the most appropriate tags and vice versa. This paper introduces Labeled LDA, a topic model that constrains Latent Dirichlet Allocation by defining a one-to-one correspondence between LDA's latent topics and user tags. This allows Labeled LDA to directly learn word-tag correspondences. We demonstrate Labeled LDA's improved expressiveness over traditional LDA with visualizations of a corpus of tagged web pages from del.icio.us. Labeled LDA outperforms SVMs by more than 3 to 1 when extracting tag-specific document snippets. As a multi-label text classifier, our model is competitive with a discriminative baseline on...
Daniel Ramage, David Hall, Ramesh Nallapati, Chris
Added 17 Feb 2011
Updated 17 Feb 2011
Type Journal
Year 2009
Where EMNLP
Authors Daniel Ramage, David Hall, Ramesh Nallapati, Christopher D. Manning
Comments (0)