Sciweavers

IICAI
2007

ANERsys 2.0: Conquering the NER Task for the Arabic Language by Combining the Maximum Entropy with POS-tag Information

13 years 5 months ago
ANERsys 2.0: Conquering the NER Task for the Arabic Language by Combining the Maximum Entropy with POS-tag Information
In this paper we describe an improved version of ANERsys, an Arabic Named Entity Recognition system for open-domain texts. The first version of ANERsys was totally based on the Maximum Entropy approach and was trained and tested with corpora which we have built ourselves. The results showed that the Maximum Entropy is an appropriate method to identify Named Entities in Arabic texts. However, in order to reach higher performance a greater effort needed to be done to improve the recognition of long proper names. Therefore, in the second version of ANERsys, we use a Part Of Speech tagger and a two-steps approach to enhance the performance of the system. Furthermore, we have used our own (now freely available on our website) corpora (ANERcorp) and gazetteers (ANERgazet) to train and evaluate ANERsys 2.0. We carried out several experiments to evaluate the performance of the system and to compare it with the online freely available demo version of the commercial system Siraj (Sakhr). The r...
Yassine Benajiba, Paolo Rosso
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where IICAI
Authors Yassine Benajiba, Paolo Rosso
Comments (0)