Compressed Perfect Embedded Skip Lists for Quick Inverted-Index Lookups

15 years 10 months ago

Download vigna.dsi.unimi.it

Large inverted indices are by now common in the construction of web-scale search engines. For faster access, inverted indices are indexed internally so that it is possible to skip quickly over unnecessary documents. The classical approach to skipping dictates that a skip should be positioned every √ f document pointers, where f is the overall number of documents where the term appears. We argue that due to the growing size of the web more reﬁned techniques are necessary, and describe how to embed a compressed perfect skip list in an inverted list. We provide statistical models that explain the empirical distribution of the skip data we observe in our experiments, and use them to devise good compression techniques that allow us to limit the waste in space, so that the resulting data structure increases the overall index size by just a few percents, still making it possible to index pointers with a rather ﬁne granularity.

Paolo Boldi, Sebastiano Vigna

Real-time Traffic

Compressed Perfect Skip | Large Inverted Indices | Skip | SPIRE 2005 |

claim paper

Post Info
More Details (n/a)

Added	28 Jun 2010
Updated	28 Jun 2010
Type	Conference
Year	2005
Where	SPIRE
Authors	Paolo Boldi, Sebastiano Vigna

Comments (0)

Sciweavers

Compressed Perfect Embedded Skip Lists for Quick Inverted-Index Lookups

Compressed Perfect Skip | Large Inverted Indices | Skip | SPIRE 2005 |

Explore & Download

Productivity Tools

Sciweavers