Sciweavers

DCC
2004
IEEE

An Approximation to the Greedy Algorithm for Differential Compression of Very Large Files

14 years 4 months ago
An Approximation to the Greedy Algorithm for Differential Compression of Very Large Files
We present a new differential compression algorithm that combines the hash value techniques and suffix array techniques of previous work. Differential compression refers to encoding a file (a version file) as a set of changes with respect to another file (a reference file). Previous differential compression algorithms can be shown empirically to run in linear-time but they have certain drawbacks, namely they do not find the best matches for every offset of the version file. Our algorithm finds the best matches for every offset of the version file, with respect to a certain granularity (or block size) and above a certain length threshold. It has two variations depending on how we choose the block size. If we keep the block size fixed, we show that the compression performance of our algorithm is similar to that of the greedy algorithm, without the expensive space and time requirements. If we vary the block size linearly with the reference file size, we show that our algorithm can run in...
Ramesh C. Agarwal, Suchitra Amalapurapu, Shaili Ja
Added 25 Dec 2009
Updated 25 Dec 2009
Type Conference
Year 2004
Where DCC
Authors Ramesh C. Agarwal, Suchitra Amalapurapu, Shaili Jain
Comments (0)