Sciweavers

DCC
2007
IEEE

Simple Linear-Time Off-Line Text Compression by Longest-First Substitution

14 years 4 months ago
Simple Linear-Time Off-Line Text Compression by Longest-First Substitution
We consider grammar based text compression with longest first substitution, where non-overlapping occurrences of a longest repeating substring of the input text are replaced by a new non-terminal symbol. We present a new text compression algorithm by simplifying the algorithm presented in [4]. We give a new formulation of the correctness proof introducing the sparse lazy suffix tree data structure. We also present another type of longest first substitution strategy that allows better compression. We show results of preliminary experiments comparing grammar sizes of the two versions of the longest first strategy and the most frequent strategy.
Ryosuke Nakamura, Hideo Bannai, Shunsuke Inenaga,
Added 25 Dec 2009
Updated 25 Dec 2009
Type Conference
Year 2007
Where DCC
Authors Ryosuke Nakamura, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
Comments (0)