Bootstrapping Feature-Rich Dependency Parsers with Entropic Priors

13 years 6 months ago

Download acl.ldc.upenn.edu

One may need to build a statistical parser for a new language, using only a very small labeled treebank together with raw text. We argue that bootstrapping a parser is most promising when the model uses a rich set of redundant features, as in recent models for scoring dependency parses (McDonald et al., 2005). Drawing on Abney’s (2004) analysis of the Yarowsky algorithm, we perform bootstrapping by entropy regularization: we maximize a linear combination of conditional likelihood on labeled data and conﬁdence (negative R´enyi entropy) on unlabeled data. In initial experiments, this surpassed EM for training a simple feature-poor generative model, and also improved the performance of a feature-rich, conditionally estimated model where EM could not easily have been applied. For our models and training sets, more peaked measures of conﬁdence, measured by R´enyi entropy, outperformed smoother ones. We discuss how our feature set could be extended with cross-lingual or cross-domain...

David A. Smith, Jason Eisner

Real-time Traffic

Corpus—a Large Treebank | EMNLP 2007 | Natural Language Processing | Small Labeled Treebank | Statistical Parsers |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	EMNLP
Authors	David A. Smith, Jason Eisner

Comments (0)

Sciweavers

Bootstrapping Feature-Rich Dependency Parsers with Entropic Priors

Corpus—a Large Treebank | EMNLP 2007 | Natural Language Processing | Small Labeled Treebank | Statistical Parsers |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers