Sciweavers

735 search results - page 115 / 147
» Corpora and data preparation
Sort
View
LREC
2010
233views Education» more  LREC 2010»
14 years 11 months ago
The Development of a Morphosyntactic Tagset for Afrikaans and its Use with Statistical Tagging
In this paper, we present a morphosyntactic tagset for Afrikaans based on the guidelines developed by the Expert Advisory Group on Language Engineering Standards (EAGLES). We comp...
Boris Haselbach, Ulrich Heid
NIPS
2007
14 years 11 months ago
Distributed Inference for Latent Dirichlet Allocation
We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where ...
David Newman, Arthur Asuncion, Padhraic Smyth, Max...
AAAI
2006
14 years 11 months ago
Proposing a New Term Weighting Scheme for Text Categorization
In text categorization, term weighting methods assign appropriate weights to the terms to improve the classification performance. In this study, we propose an effective term weigh...
Man Lan, Chew Lim Tan, Hwee-Boon Low
ANLP
1994
134views more  ANLP 1994»
14 years 11 months ago
Degraded Text Recognition Using Word Collocation and Visual Inter-Word Constraints
Given a noisy text page, a word recognizer can generate a set of candidates for each word image. A relaxation algorithm was proposed previously by the authors that uses word collo...
Tao Hong, Jonathan J. Hull
ICML
2010
IEEE
14 years 11 months ago
Distance dependent Chinese restaurant processes
We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for nonexchangeability. This class can be used to...
David M. Blei, Peter Frazier