Sciweavers

WISE
2002
Springer

Cluster-Based Delta Compression of a Collection of Files

13 years 9 months ago
Cluster-Based Delta Compression of a Collection of Files
Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by performing a sequence of pairwise delta compressions. The problem of finding an optimal delta encoding for a collection of files by taking pairwise deltas can be reduced to the problem of computing a branching of maximum weight in a weighted directed graph, but this solution is inefficient and thus does not scale to larger file collections. This motivates us to propose a framework for clusterbased delta compression that uses text clustering techniques to prune the graph of possible pairwise delta encodings. To demonstrate the efficacy of our approach, we present experimental results on collections of web pages. Our experiments show that cluster-based delta compression of collections p...
Zan Ouyang, Nasir D. Memon, Torsten Suel, Dimitre
Added 16 Jul 2010
Updated 16 Jul 2010
Type Conference
Year 2002
Where WISE
Authors Zan Ouyang, Nasir D. Memon, Torsten Suel, Dimitre Trendafilov
Comments (0)