Sciweavers

SIGIR
2002
ACM

Empirical studies in strategies for Arabic retrieval

13 years 4 months ago
Empirical studies in strategies for Arabic retrieval
This work evaluates a few search strategies for Arabic monolingual and cross-lingual retrieval, using the TREC Arabic corpus as the test-bed. The release by NIST in 2001 of an Arabic corpus of nearly 400k documents with both monolingual and cross-lingual queries and relevance judgments has been a new enabler for empirical studies. Experimental results show that spelling normalization and stemming can significantly improve Arabic monolingual retrieval. Character tri-grams from stems improved retrieval modestly on the test corpus, but the improvement is not statistically significant. To further improve retrieval, we propose a novel thesaurus-based technique. Different from existing approaches to thesaurus-based retrieval, ours formulates word synonyms as probabilistic term translations that can be automatically derived from a parallel corpus. Retrieval results show that the thesaurus can significantly improve Arabic monolingual retrieval. For cross-lingual retrieval (CLIR), we found tha...
Jinxi Xu, Alexander Fraser, Ralph M. Weischedel
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 2002
Where SIGIR
Authors Jinxi Xu, Alexander Fraser, Ralph M. Weischedel
Comments (0)