Sciweavers

EMNLP
2010

Latent-Descriptor Clustering for Unsupervised POS Induction

13 years 2 months ago
Latent-Descriptor Clustering for Unsupervised POS Induction
We present a novel approach to distributionalonly, fully unsupervised, POS tagging, based on an adaptation of the EM algorithm for the estimation of a Gaussian mixture. In this approach, which we call Latent-Descriptor Clustering (LDC), word types are clustered using a series of progressively more informative descriptor vectors. These descriptors, which are computed from the immediate left and right context of each word in the corpus, are updated based on the previous state of the cluster assignments. The LDC algorithm is simple and intuitive. Using standard evaluation criteria for unsupervised POS tagging, LDC shows a substantial improvement in performance over state-of-the-art methods, along with a several-fold reduction in computational cost.
Michael Lamar, Yariv Maron, Elie Bienenstock
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Where EMNLP
Authors Michael Lamar, Yariv Maron, Elie Bienenstock
Comments (0)