In this paper, we present a morphosyntactic tagset for Afrikaans based on the guidelines developed by the Expert Advisory Group on Language Engineering Standards (EAGLES). We comp...
We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where ...
David Newman, Arthur Asuncion, Padhraic Smyth, Max...
In text categorization, term weighting methods assign appropriate weights to the terms to improve the classification performance. In this study, we propose an effective term weigh...
Given a noisy text page, a word recognizer can generate a set of candidates for each word image. A relaxation algorithm was proposed previously by the authors that uses word collo...
We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for nonexchangeability. This class can be used to...