Using d-gap patterns for index compression

12 years 10 days ago
Using d-gap patterns for index compression
Sequential patterns of d-gaps exist pervasively in inverted lists of Web document collection indices due to the cluster property. In this paper the information of d-gap sequential patterns is used as a new dimension for improving inverted index compression. We first detect d-gap sequential patterns using a novel data structure, UpDown Tree. Based on the detected patterns, we further substitute each pattern with its pattern Id in the inverted lists that contain it. The resulted inverted lists are then coded with an existing coding scheme. Experiments show that this approach can effectively improve the compression ratio of existing codes. Categories and Subject Descriptors E.4 [Data]: Coding and Information Theory ? Data Compaction and Compression; H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing ? Indexing methods; General Terms: Algorithms, Performance, Experimentation, Theory.
Jinlin Chen, Terry Cook
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2007
Where WWW
Authors Jinlin Chen, Terry Cook
Comments (0)