Sciweavers

ACL
2010

Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing

13 years 1 months ago
Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags (anchors, bold, italics and underlines), we refine approximate partial phrase boundaries to yield accurate parsing constraints. Conversion procedures fall out of our linguistic analysis of a newly available million-word hyper-text corpus. We demonstrate that derived constraints aid grammar induction by training Klein and Manning's Dependency Model with Valence (DMV) on this data set: parsing accuracy on Section 23 (all sentences) of the Wall Street Journal corpus jumps to 50.4%, beating previous state-of-theart by more than 5%. Web-scale experiments show that the DMV, perhaps because it is unlexicalized, does not benefit from orders of magnitude more annotated but noisier data. Our model, trained on a single blog, generalizes to 53.3% accuracy out-of-domain, against the Brown corpus -- nearly 10% higher than the previous published best. The fact t...
Valentin I. Spitkovsky, Daniel Jurafsky, Hiyan Als
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where ACL
Authors Valentin I. Spitkovsky, Daniel Jurafsky, Hiyan Alshawi
Comments (0)