Sciweavers

CIKM
2007
Springer

Sigma encoded inverted files

13 years 10 months ago
Sigma encoded inverted files
Compression of term frequency lists and very long document-id lists within an inverted file search engine are examined. Several compression schemes are compared including Elias γ and δ codes, Golomb Encoding, Variable Byte Encoding, and a class of wordbased encoding schemes including Simple-9, Relative-10 and Carryover-12. It is shown that these compression methods are not well suited to compressing these kinds of lists of numbers. Of those tested, Carryover-12 is preferred because it is both effective at compression and fast at decompression. A novel technique, Sigma Encoding prior to compression, is proposed and tested. Sigma Encoding utilizes a parameterized dictionary to reduce the number of bits necessary to store an integer. This method shows an about 0.3 bit per integer improvement over Carryover-12 while costing only about 3 extra clock cycles per integer to decompress. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexin...
Andrew Trotman, Vikram Subramanya
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where CIKM
Authors Andrew Trotman, Vikram Subramanya
Comments (0)