Sciweavers

SIGIR
2002
ACM

Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis

13 years 4 months ago
Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis
Arabic, a highly inflected language, requires good stemming for effective information retrieval, yet no standard approach to stemming has emerged. We developed several light stemmers based on heuristics and a statistical stemmer based on co-occurrence for Arabic retrieval. We compared the retrieval effectiveness of our stemmers and of a morphological analyzer on the TREC-2001 data. The best light stemmer was more effective for cross-language retrieval than a morphological stemmer which tried to find the root for each word. A repartitioning process consisting of vowel removal followed by clustering using co-occurrence analysis produced stem classes which were better than no stemming or very light stemming, but still inferior to good light stemming or morphological analysis. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing
Leah S. Larkey, Lisa Ballesteros, Margaret E. Conn
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 2002
Where SIGIR
Authors Leah S. Larkey, Lisa Ballesteros, Margaret E. Connell
Comments (0)