Using Linguistic Principles to Recover Empty Categories

15 years 5 months ago

Download acl.ldc.upenn.edu

This paper describes an algorithm for detecting empty nodes in the Penn Treebank (Marcus et al., 1993), finding their antecedents, and assigning them function tags, without access to lexical information such as valency. Unlike previous approaches to this task, the current method is not corpus-based, but rather makes use of the principles of early Government-Binding theory (Chomsky, 1981), the syntactic theory that underlies the annotation. Using the evaluation metric proposed by Johnson (2002), this approach outperforms previously published approaches on both detection of empty categories and antecedent identification, given either annotated input stripped of empty categories or the output of a parser. Some problems with this evaluation metric are noted and an alternative is proposed along with the results. The paper considers the reasons a principlebased approach to this problem should outperform corpus-based approaches, and speculates on the possibility of a hybrid approach.

Richard Campbell

Real-time Traffic