Putting Successor Variety Stemming to Work

13 years 8 months ago

Download www.uni-weimar.de

Stemming algorithms find canonical forms for inflected words, e. g. for declined nouns or conjugated verbs. Since such a unification of words with respect to gender, number, time, and case is a language-specific issue, stemming algorithms operationalize a set of linguistically motivated rules for the language in question. The most well-known rule-based algorithm for the English language is from Porter [14]. The paper presents a statistical stemming approach which is based on the analysis of the distribution of word prefixes in a document collection, and which thus is widely language-independent. In particular, our approach addresses the problem of index construction for multi-lingual documents. Related work for statistical stemming focuses either on stemming quality [2,3] or on runtime performance [11], but neither provides a reasonable tradeoff between both. For selected retrieval tasks under vector-based document models we report on new results related to stemming quality and collect...

Benno Stein, Martin Potthast

Real-time Traffic

Data Mining | GFKL 2006 | Statistical Stemming Approach | Statistical Stemming Focuses | Stemming Algorithms |

claim paper

Post Info
More Details (n/a)

Added	23 Aug 2010
Updated	23 Aug 2010
Type	Conference
Year	2006
Where	GFKL
Authors	Benno Stein, Martin Potthast

Comments (0)

Sciweavers

Putting Successor Variety Stemming to Work

Data Mining | GFKL 2006 | Statistical Stemming Approach | Statistical Stemming Focuses | Stemming Algorithms |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers