Stemming Approaches for East European Languages

12 years 2 months ago
Stemming Approaches for East European Languages
In our participation in this CLEF evaluation campaign, the first objective is to propose and evaluate various indexing and search strategies for the Czech language in order to hopefully produce better retrieval effectiveness than that of the language-independent approach (n-gram). Based on our stemming strategy used with other languages, we propose two light stemmers for this Slavic language and a third one based on a more aggressive suffix-stripping scheme that removes some derivational suffixes. Our second objective is to obtain a better picture of the relative merit of various search engines in exploring Hungarian and Bulgarian documents. Moreover for the Bulgarian language we developed a new and more aggressive stemmer. To evaluate these solutions we use our various IR models, including the Okapi, Divergence from Randomness (DFR) and statistical language model (LM) together with the classical tf.idf vectorprocessing approach. Our experiments tend to show that for the Bulgarian lan...
Ljiljana Dolamic, Jacques Savoy
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where CLEF
Authors Ljiljana Dolamic, Jacques Savoy
Comments (0)