Part-of-speech (POS) tag distributions are known to exhibit sparsity -- a word is likely to take a single predominant tag in a corpus. Recent research has demonstrated that incorp...
We propose a new model for unsupervised POS tagging based on linguistic distinctions between open and closed-class items. Exploiting notions from current linguistic theory, the sy...
We define the crouching Dirichlet, hidden Markov model (CDHMM), an HMM for partof-speech tagging which draws state prior distributions for each local document context. This simple...
It is known that POS tagging is not very accurate for unknown words (words which the POS tagger has not seen in the training corpora). Thus, a first step to improve the tagging ac...
Dan Tufis, Elena Irimia, Radu Ion, Alexandru Ceaus...
We present a novel approach to distributionalonly, fully unsupervised, POS tagging, based on an adaptation of the EM algorithm for the estimation of a Gaussian mixture. In this ap...