Sciweavers

ICDM
2007
IEEE

Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models

13 years 11 months ago
Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models
Statistical topic models such as the Latent Dirichlet Allocation (LDA) have emerged as an attractive framework to model, visualize and summarize large document collections in a completely unsupervised fashion. One of the limitations of this family of models is their assumption of exchangeability of words within documents, which results in a ‘bag-ofwords’ representation for documents as well as topics. As a consequence, precious information that exists in the form of correlations between words is lost in these models. In this work, we adapt recent advances in sparse modeling techniques to the problem of modeling word correlations within topics and present a new algorithm called Sparse Word Graphs. Our experiments on AP corpus reveal both long-distance and short-distance word correlations within topics that are semantically very meaningful. In addition, the new algorithm is highly scalable to large collections as it captures only the most important correlations in a sparse manner.
Ramesh Nallapati, Amr Ahmed, William W. Cohen, Eri
Added 03 Jun 2010
Updated 03 Jun 2010
Type Conference
Year 2007
Where ICDM
Authors Ramesh Nallapati, Amr Ahmed, William W. Cohen, Eric P. Xing
Comments (0)