Sciweavers

DCC
2010
IEEE

Lossless Data Compression via Substring Enumeration

13 years 11 months ago
Lossless Data Compression via Substring Enumeration
We present a technique that compresses a string w by enumerating all the substrings of w. The substrings are enumerated from the shortest to the longest and in lexicographic order. Compression is obtained from the fact that the set of the substrings of a particular length gives a lot of information about the substrings that are one bit longer. A linear-time, linear-space algorithm is presented. Experimental results show that the compression efficiency comes close to that of the best PPM variants. Other compression techniques are compared to ours. 1 Basic Idea We propose a technique of lossless data compression via substring enumeration (CSE) that compresses a string of bits D in three steps: first, it builds a tree that counts the number of occurrences of each of D’s substrings, while considering D to be circular; second, it enumerates all of D’s substrings, from the shortest to the longest and in lexicographic order, and, third, it indicates which of the full-length substrings i...
Danny Dubé, Vincent Beaudoin
Added 17 May 2010
Updated 17 May 2010
Type Conference
Year 2010
Where DCC
Authors Danny Dubé, Vincent Beaudoin
Comments (0)