Sciweavers

SPIRE
2005
Springer

Compressed Perfect Embedded Skip Lists for Quick Inverted-Index Lookups

13 years 10 months ago
Compressed Perfect Embedded Skip Lists for Quick Inverted-Index Lookups
Large inverted indices are by now common in the construction of web-scale search engines. For faster access, inverted indices are indexed internally so that it is possible to skip quickly over unnecessary documents. The classical approach to skipping dictates that a skip should be positioned every √ f document pointers, where f is the overall number of documents where the term appears. We argue that due to the growing size of the web more refined techniques are necessary, and describe how to embed a compressed perfect skip list in an inverted list. We provide statistical models that explain the empirical distribution of the skip data we observe in our experiments, and use them to devise good compression techniques that allow us to limit the waste in space, so that the resulting data structure increases the overall index size by just a few percents, still making it possible to index pointers with a rather fine granularity.
Paolo Boldi, Sebastiano Vigna
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where SPIRE
Authors Paolo Boldi, Sebastiano Vigna
Comments (0)