Sciweavers

EMNLP
2011

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

12 years 4 months ago
Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags — requiring a word to always have the same part-ofspeech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-ofthe-art dependency grammar inducer achieves 59.1% directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus — 0.7% highe...
Valentin I. Spitkovsky, Hiyan Alshawi, Angel X. Ch
Added 20 Dec 2011
Updated 20 Dec 2011
Type Journal
Year 2011
Where EMNLP
Authors Valentin I. Spitkovsky, Hiyan Alshawi, Angel X. Chang, Daniel Jurafsky
Comments (0)