Sciweavers

IR
2007

Searching strategies for the Bulgarian language

13 years 4 months ago
Searching strategies for the Bulgarian language
This paper reports on the underlying IR problems encountered when indexing and searching with the Bulgarian language. For this language we propose a general light stemmer and demonstrate that it can be quite effective, producing significantly better MAP (around + 34%) than an approach not applying stemming. We implement the GL2 model derived from the Divergence from Randomness paradigm and find its retrieval effectiveness better than other probabilistic, vector-space and language models. The resulting MAP is found to be about 50% better than the classical tf idf approach. Moreover, increasing the query size enhances the MAP by around 10% (from T to TD). In order to compare the retrieval effectiveness of our suggested stopword list and the light stemmer developed for the Bulgarian language, we conduct a set of experiments on another stopword list and also a more complex and aggressive stemmer. Results tend to indicate that there is no statistically significant difference between thes...
Jacques Savoy
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2007
Where IR
Authors Jacques Savoy
Comments (0)