Supervised and unsupervised PCFG adaptation to novel domains

8 years 11 months ago
Supervised and unsupervised PCFG adaptation to novel domains
This paper investigates adapting a lexicalized probabilistic context-free grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example. Other approaches falling within this framework are more effective. In contrast to the results in Gildea (2001), we show F-measure parsing accuracy gains of as much as 2.5% for high accuracy lexicalized parsing through the use of out-of-domain treebanks, with the largest gains when the amount of indomain data is small. MAP adaptation can also be based on either supervised or unsupervised adaptation data. Even when no in-domain treebank is available, unsupervised techniques provide a substantial accuracy gain over unadapted grammars, as much as nearly 5% F-measure improvement.
Brian Roark, Michiel Bacchiani
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Authors Brian Roark, Michiel Bacchiani
Comments (0)