Thematic Segment Retrieval Revisited

8 years 10 months ago
Thematic Segment Retrieval Revisited
Documents, especially long ones, may contain very diverse passages related to different topics. Passages Retrieval approaches have shown that, in most cases, there is a great potential benefit in considering these passages independently when computing the similarity of a document with a user’s query. Experiments have been realized in order to identify the kinds of passage which are the best suited for such a process. Contrarily to what could have been expected, working with thematic segments, which are likely to represent only one topic each, has led to greatly lower effectiveness results than the use of arbitrary sequences of words. In this paper, we show that this paradoxical observation is mainly due to biases induced by the great length diversity of the thematic passages. Therefore, we propose here to cope with these biases by using a more powerful text length normalization technique. Experiments show that, when length biases are laid aside, the use of thematic passages is bet...
Sylvain Lamprier, Tassadit Amghar, Bernard Levrat,
Added 01 Jun 2010
Updated 01 Jun 2010
Type Conference
Year 2008
Authors Sylvain Lamprier, Tassadit Amghar, Bernard Levrat, Frédéric Saubion
Comments (0)