Sciweavers

SPIRE
1998
Springer

Direct Pattern Matching on Compressed Text

13 years 8 months ago
Direct Pattern Matching on Compressed Text
We present a fast compression and decompression technique for natural language texts. The novelty is that the exact search can be done on the compressed text directly, using any known sequential pattern matching algorithm. Approximate search can also be done e ciently without any decoding. The compression scheme uses a semi-static word-based modeling and a Hu man coding where the coding alphabet is byte-oriented rather than bit-oriented. We use the rst bit of each byte to mark the beginning of a word, which allows the searching of the compressed pattern directly on the compressed text. We achieve about 33% compression ratio for typical English texts. When searching for simple patterns, our experiments show that running our algorithm on a compressed text is almost twice as fast as running agrep on the uncompressed version of the same text. When searching complex or approximate patterns, our algorithm is up to 8 times faster than agrep.
Edleno Silva de Moura, Gonzalo Navarro, Nivio Zivi
Added 06 Aug 2010
Updated 06 Aug 2010
Type Conference
Year 1998
Where SPIRE
Authors Edleno Silva de Moura, Gonzalo Navarro, Nivio Ziviani, Ricardo A. Baeza-Yates
Comments (0)