Sciweavers

ERCIMDL
2005
Springer

Compressing Dynamic Text Collections via Phrase-Based Coding

13 years 9 months ago
Compressing Dynamic Text Collections via Phrase-Based Coding
We present a new statistical compression method, which we call Phrase Based Dense Code (PBDC), aimed at compressing large digital libraries. PBDC compresses the text collection to 30–32% of its original size, permits maintaining the text compressed all the time, and offers efficient on-line information retrieval services. The novelty of PBDC is that it supports continuous growing of the compressed text collection, by automatically adapting the vocabulary both to new words and to changes in the word frequency distribution, without degrading the compression ratio. Text compressed with PBDC can be searched directly without decompression, using fast Boyer-Moore algorithms. It is also possible to decompress arbitrary portions of the collection. Alternative compression methods oriented to information retrieval focus on static collections and thus are less well suited to digital libraries.
Nieves R. Brisaboa, Antonio Fariña, Gonzalo
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where ERCIMDL
Authors Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, José R. Paramá
Comments (0)