Sciweavers

134
Voted
VLDB
2007
ACM

Measuring the Structural Similarity of Semistructured Documents Using Entropy

15 years 10 months ago
Measuring the Structural Similarity of Semistructured Documents Using Entropy
We propose a technique for measuring the structural similarity of semistructured documents based on entropy. After extracting the structural information from two documents we use either Ziv-Lempel encoding or Ziv-Merhav crossparsing to determine the entropy and consequently the similarity between the documents. To the best of our knowledge, this is the first true linear-time approach for evaluating structural similarity. In an experimental evaluation we demonstrate that the results of our algorithm in terms of clustering quality are on a par with or even better than existing approaches.
Sven Helmer
Added 05 Dec 2009
Updated 05 Dec 2009
Type Conference
Year 2007
Where VLDB
Authors Sven Helmer
Comments (0)