Sciweavers

CIAC
2000
Springer

Speeding Up Pattern Matching by Text Compression

13 years 8 months ago
Speeding Up Pattern Matching by Text Compression
Abstract. Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring out a potential advantage of BPE compression. We show that it is very suitable from a practical view point of compressed pattern matching, where the goal is to find a pattern directly in compressed text without decompressing it explicitly. We compare running times to find a pattern in (1) BPE compressed files, (2) Lempel-ZivWelch compressed files, and (3) original text files, in various situations. Experimental results show that pattern matching in BPE compressed text is even faster than matching in the original text. Thus the BPE compression reduces not only the disk space but als...
Yusuke Shibata, Takuya Kida, Shuichi Fukamachi, Ma
Added 02 Aug 2010
Updated 02 Aug 2010
Type Conference
Year 2000
Where CIAC
Authors Yusuke Shibata, Takuya Kida, Shuichi Fukamachi, Masayuki Takeda, Ayumi Shinohara, Takeshi Shinohara, Setsuo Arikawa
Comments (0)