Sciweavers

LREC
2008

Word-Based or Morpheme-Based? Annotation Strategies for Modern Hebrew Clitics

13 years 5 months ago
Word-Based or Morpheme-Based? Annotation Strategies for Modern Hebrew Clitics
Morphologically rich languages pose a challenge to the annotators of treebanks with respect to the status of orthographic (spacedelimited) words in the syntactic parse trees. In such languages an orthographic word may carry various, distinct, sorts of information and the question arises whether we should represent such words as a sequence of their constituent morphemes (i.e., a Morpheme-Based annotation strategy) or whether we should preserve their special orthographic status within the trees (i.e., a Word-Based annotation strategy). In this paper we empirically address this challenge in the context of the development of Language Resources for Modern Hebrew. We compare and contrast the Morpheme-Based and Word-Based annotation strategies of pronominal clitics in Modern Hebrew and we show that the Word-Based strategy is more adequate for the purpose of training statistical parsers as it provides a better PP-attachment disambiguation capacity and a better alignment with initial surface f...
Reut Tsarfaty, Yoav Goldberg
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Reut Tsarfaty, Yoav Goldberg
Comments (0)