Improving statistical parsing by linguistic regularization

8 years 11 months ago
Improving statistical parsing by linguistic regularization
Statistically-based parsers for large corpora, in particular the Penn Tree Bank (PTB), typically have not used all the linguistic information encoded in the annotated trees on which they are trained. In particular, they have not in general used information that records the effects of derivations, such as empty categories and the representation of displaced phrases, as is the case with passive, topicalization, and whconstructions. Here we explore ways to use this information to "unwind" derivations, yielding a regularized underlying syntactic structure that can be used as an additional source of information for more accurate parsing. In effect, we make use of two joint sets of tree structures for parsing: the surface structure and its corresponding underlying structure where arguments have been restored to their canonical positions. We present a pilot experiment on passives in the PTB indicating that through the use of these two syntactic representations we can improve overall...
Igor Malioutov, Robert C. Berwick
Added 13 Feb 2011
Updated 13 Feb 2011
Type Journal
Year 2010
Where ISDA
Authors Igor Malioutov, Robert C. Berwick
Comments (0)