Hierarchical probabilistic modeling of discrete data has emerged as a powerful tool for text analysis. Posterior inference in such models is intractable, and practitioners rely on...
We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where ...
David Newman, Arthur Asuncion, Padhraic Smyth, Max...
Analyzing the author and topic relations in email corpus is an important issue in both social network analysis and text mining. The AuthorTopic model is a statistical model that id...
The aim of query-based sampling is to obtain a sufficient, representative sample of an underlying (text) collection. Current measures for assessing sample quality are too coarse gr...
Low-dimensional topic models have been proven very useful for modeling a large corpus of documents that share a relatively small number of topics. Dimensionality reduction tools s...