Sciweavers

ACL
1998

Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification

13 years 5 months ago
Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification
Finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. While previous empirical methods for base NP identification have been rather complex, this paper instead proposes a very simple algorithm that is tailored to the relative simplicity of the task. In particular, we present a corpus-based approach for finding base NPs by matching part-ofspeech tag sequences. The training phase of the algorithm is based on two successful techniques: first the base NP grammar is read from a "treebank" corpus; then the grammar is improved by selecting rules with high "benefit" scores. Using this simple algorithm with a naive heuristic for matching rules, we achieve surprising accuracy in an evaluation on the Penn Treebank Wall Street Journal.
Claire Cardie, David R. Pierce
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 1998
Where ACL
Authors Claire Cardie, David R. Pierce
Comments (0)