Two Decades of Unsupervised POS Induction: How Far Have We Come?

14 years 11 months ago

Download www.aclweb.org

Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. Many different methods have been proposed, yet comparisons are difficult to make since there is little consensus on evaluation framework, and many papers evaluate against only one or two competitor systems. Here we evaluate seven different POS induction systems spanning nearly 20 years of work, using a variety of measures. We show that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches. Since most of these systems were developed and tested using data from the WSJ corpus, we compare their generalization abilities by testing on both WSJ and the multilingual Multext-East corpus. Finally, we introduce the idea of evaluating systems based on their ability to produce cluster prototypes that are useful as input to a prototype-driven learner. In most cases, the prototype-driven learner outperforms the unsupervised system used to initialize it, yield...

Christos Christodoulopoulos, Sharon Goldwater, Mar

Real-time Traffic

EMNLP 2010 | Multilingual Multext-east Corpus | Natural Language Processing | Prototype-driven Learner | Systems |

claim paper

Added	11 Feb 2011
Updated	11 Feb 2011
Type	Journal
Year	2010
Where	EMNLP
Authors	Christos Christodoulopoulos, Sharon Goldwater, Mark Steedman

Sciweavers

Two Decades of Unsupervised POS Induction: How Far Have We Come?

EMNLP 2010 | Multilingual Multext-east Corpus | Natural Language Processing | Prototype-driven Learner | Systems |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers