Sciweavers

VLDB
2007
ACM

Measuring the Structural Similarity of Semistructured Documents Using Entropy

14 years 4 months ago
Measuring the Structural Similarity of Semistructured Documents Using Entropy
We propose a technique for measuring the structural similarity of semistructured documents based on entropy. After extracting the structural information from two documents we use either Ziv-Lempel encoding or Ziv-Merhav crossparsing to determine the entropy and consequently the similarity between the documents. To the best of our knowledge, this is the first true linear-time approach for evaluating structural similarity. In an experimental evaluation we demonstrate that the results of our algorithm in terms of clustering quality are on a par with or even better than existing approaches.
Sven Helmer
Added 05 Dec 2009
Updated 05 Dec 2009
Type Conference
Year 2007
Where VLDB
Authors Sven Helmer
Comments (0)