Accurate Unlexicalized Parsing for Modern Hebrew

11 years 7 months ago
Accurate Unlexicalized Parsing for Modern Hebrew
Many state-of-the-art statistical parsers for English can be viewed as Probabilistic Context-Free Grammars (PCFGs) acquired from treebanks consisting of phrase-structure trees enriched with a variety of contextual, derivational (e.g., markovization) and lexical information. In this paper we empirically investigate the applicability and adequacy of the unlexicalized variety of such parsing models to Modern Hebrew, a Semitic language that differs in structure and characteristics from English. We show that contrary to experience with parsing the WSJ, the markovized, head-driven unlexicalized variety does not necessarily outperform plain PCFGs for Semitic languages. We demonstrate that enriching unlexicalized PCFGs with morphologically marked agreement features percolated up the parse tree (e.g., definiteness) outperforms plain PCFGs as well as a simple head-driven variation on the MH treebank. We further show that an (unlexicalized) head-driven variety enriched with the same features ac...
Reut Tsarfaty, Khalil Sima'an
Added 09 Jun 2010
Updated 09 Jun 2010
Type Conference
Year 2007
Where TSD
Authors Reut Tsarfaty, Khalil Sima'an
Comments (0)