Sciweavers

SIGIR
2012
ACM

To index or not to index: time-space trade-offs in search engines with positional ranking functions

11 years 6 months ago
To index or not to index: time-space trade-offs in search engines with positional ranking functions
Positional ranking functions, widely used in web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time-space tradeoffs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether one should index positional data or not. We show that there is a wide range of practical time-space trade-offs. Moreover, we show that both position and textual data can be stored using about 71% of the space used by traditional positional indexes, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, r...
Diego Arroyuelo, Senén González, Mau
Added 28 Sep 2012
Updated 28 Sep 2012
Type Journal
Year 2012
Where SIGIR
Authors Diego Arroyuelo, Senén González, Mauricio Marín, Mauricio Oyarzún, Torsten Suel
Comments (0)