Sciweavers

66 search results - page 1 / 14
» Measuring the Structural Similarity of Semistructured Docume...
Sort
View
VLDB
2007
ACM
93views Database» more  VLDB 2007»
14 years 5 months ago
Measuring the Structural Similarity of Semistructured Documents Using Entropy
We propose a technique for measuring the structural similarity of semistructured documents based on entropy. After extracting the structural information from two documents we use ...
Sven Helmer
SIGIR
2006
ACM
13 years 10 months ago
Measuring similarity of semi-structured documents with context weights
In this work, we study similarity measures for text-centric XML documents based on an extended vector space model, which considers both document content and structure. Experimenta...
Christopher C. Yang, Nan Liu
KES
2004
Springer
13 years 10 months ago
Knowledge Extraction from Semi-structured Data Based on Fuzzy Techniques
Abstract. In this work we propose a fuzzy technique to compare XML documents belonging to a semi-structured flow and sharing a common vocabulary of tags. Our approach is based on t...
Paolo Ceravolo, Maria Cristina Nocerino, Marco Viv...
ICDM
2002
IEEE
162views Data Mining» more  ICDM 2002»
13 years 9 months ago
Phrase-based Document Similarity Based on an Index Graph Model
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...
Khaled M. Hammouda, Mohamed S. Kamel
DIS
2001
Springer
13 years 9 months ago
Eliminating Useless Parts in Semi-structured Documents Using Alternation Counts
We propose a preprocessing method for Web mining which, given semi-structured documents with the same structure and style, distinguishes useless parts and non-useless parts in each...
Daisuke Ikeda, Yasuhiro Yamada, Sachio Hirokawa