Sciweavers

LREC
2010

Arabic Word Segmentation for Better Unit of Analysis

13 years 6 months ago
Arabic Word Segmentation for Better Unit of Analysis
The Arabic language has a very rich morphology where a word is composed of zero or more prefixes, a stem and zero or more suffixes. This makes Arabic data sparse compared to other languages, such as English, and consequently word segmentation becomes very important for many Natural Language Processing tasks that deal with the Arabic language. We present in this paper two segmentation schemes that are morphological segmentation and Arabic TreeBank segmentation and we show their impact on an important natural language processing task that is mention detection. Experiments on Arabic TreeBank corpus show 98.1% accuracy on morphological segmentation and 99.4% on morphological segmentation. We also discuss the importance of segmenting the text; experiments show up to 6F points improvement of the mention detection system performance when morphological segmentation is used instead of not segmenting the text. Obtained results also show up to 3F points improvement is achieved when the appropria...
Yassine Benajiba, Imed Zitouni
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Yassine Benajiba, Imed Zitouni
Comments (0)