Sciweavers

ECIR
2011
Springer

Classifying with Co-stems - A New Representation for Information Filtering

12 years 8 months ago
Classifying with Co-stems - A New Representation for Information Filtering
Besides the content the writing style is an important discriminator in information filtering tasks. Ideally, the solution of a filtering task employs a text representation that models both kinds of characteristics. In this respect word stems are clearly content capturing, whereas word suffixes qualify as writing style indicators. Though the latter feature type is used for part of speech tagging, it has not yet been employed for information filtering in general. We propose a text representation that combines both the output of a stemming algorithm (stems) and the stem-reduced words (co-stems). A co-stem can be a prefix, an infix, a suffix, or a concatenation of prefixes, infixes, or suffixes. Using accepted standard corpora, we analyze the discriminative power of this representation for a broad range of information filtering tasks to provide new insights into the adequacy and task-specificity of text representation models. Altogether we observe that co-stem-based representat...
Nedim Lipka, Benno Stein
Added 27 Aug 2011
Updated 27 Aug 2011
Type Journal
Year 2011
Where ECIR
Authors Nedim Lipka, Benno Stein
Comments (0)