A family of probabilistic time series models is developed to analyze the time evolution of topics in large document collections. The approach is to use state space models on the n...
This paper considers dynamic language model adaptation for Mandarin broadcast news recognition. Both contemporary newswire texts and in-domain automatic transcripts were exploited...
Abstract. Statistical debugging uses machine learning to model program failures and help identify root causes of bugs. We approach this task using a novel Delta-Latent-Dirichlet-Al...
David Andrzejewski, Anne Mulhern, Ben Liblit, Xiao...
Some models of textual corpora employ text generation methods involving n-gram statistics, while others use latent topic variables inferred using the "bag-of-words" assu...
In this paper, we review two techniques for topic discovery in collections of text documents (Latent Semantic Indexing and K-Means clustering) and present how we integrated them in...