Sciweavers

LREC
2010

Consistent and Flexible Integration of Morphological Annotation in the Arabic Treebank

13 years 5 months ago
Consistent and Flexible Integration of Morphological Annotation in the Arabic Treebank
tions arise for standoff annotation when the annotation is not on the source text itself, but on a more abstract representation. This is particularly the case in a language such as Arabic with morphological and orthographic challenges, and we discuss various aspects of these issues in the context of the Arabic Treebank. The Standard Arabic Morphological Analyzer (SAMA) is closely ed into the annotation workflow, as the basis for the abstraction between the explicit source text and the more abstract token representation. However, this integration with SAMA gives rise to various problems for the annotation workflow and for maintaining the link between the Treebank and SAMA. In this paper we discuss how we have overcome these problems with consistent and more precise categorization of all of the tokens for their relationship with SAMA. We also discuss how we have improved the creation of several distinct alternative forms of the tokens used in the syntactic trees. As a result, the Treeba...
Seth Kulick, Ann Bies, Mohamed Maamouri
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Seth Kulick, Ann Bies, Mohamed Maamouri
Comments (0)