Dual-Sorted Inverted Lists

10 years 12 months ago
Dual-Sorted Inverted Lists
Several IR tasks rely, to achieve high efficiency, on a single pervasive data structure called the inverted index. This is a mapping from the terms in a text collection to the documents where they appear, plus some supplementary data. Different orderings in the list of documents associated to a term, and different supplementary data, fit widely different IR tasks. Index designers have to choose the right order for one such task, rendering the index difficult to use for others. In this paper we introduce a general technique, based on wavelet trees, to maintain a single data structure that offers the combined functionality of two independent orderings for an inverted index, with competitive efficiency and within the space of one compressed inverted index. We show in particular that the technique allows combining an ordering by decreasing term frequency (useful for ranked document retrieval) with an ordering by increasing document identifier (useful for phrase and Boolean queries). ...
Gonzalo Navarro, Simon J. Puglisi
Added 30 Jan 2011
Updated 30 Jan 2011
Type Journal
Year 2010
Authors Gonzalo Navarro, Simon J. Puglisi
Comments (0)