Sciweavers

IR
2007

Restricted inflectional form generation in management of morphological keyword variation

13 years 4 months ago
Restricted inflectional form generation in management of morphological keyword variation
Word form normalization through lemmatization or stemming is a standard procedure in information retrieval because morphological variation needs to be accounted for and several languages are morphologically non-trivial. Lemmatization is effective but often requires expensive resources. Stemming is also effective in most contexts, generally almost as good as lemmatization and typically much less expensive; besides it also has a query expansion effect. However, in both approaches the idea is to turn many inflectional word forms to a single lemma or stem both in the database index and in queries. This means extra effort in creating database indexes. In this paper we take an opposite approach: we leave the database index un-normalized and enrich the queries to cover for surface form variation of keywords. A potential penalty of the approach would be long queries and slow processing. However, we show that it only matters to cover a negligible number of possible surface forms even in morpho...
Kimmo Kettunen, Eija Airio, Kalervo Järvelin
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2007
Where IR
Authors Kimmo Kettunen, Eija Airio, Kalervo Järvelin
Comments (0)