We propose a technique for measuring the structural similarity of semistructured documents based on entropy. After extracting the structural information from two documents we use ...
In this work, we study similarity measures for text-centric XML documents based on an extended vector space model, which considers both document content and structure. Experimenta...
Abstract. In this work we propose a fuzzy technique to compare XML documents belonging to a semi-structured flow and sharing a common vocabulary of tags. Our approach is based on t...
Paolo Ceravolo, Maria Cristina Nocerino, Marco Viv...
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...
We propose a preprocessing method for Web mining which, given semi-structured documents with the same structure and style, distinguishes useless parts and non-useless parts in each...