Sciweavers

22 search results - page 5 / 5
» Detecting Changes to Hybrid XML Documents Using Relational D...
Sort
View
KDD
2004
ACM
195views Data Mining» more  KDD 2004»
14 years 5 months ago
Improved robustness of signature-based near-replica detection via lexicon randomization
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
ICDM
2006
IEEE
176views Data Mining» more  ICDM 2006»
13 years 11 months ago
Razor: mining distance-constrained embedded subtrees
Due to their capability for expressing semantics and relationships among data objects, semi-structured documents have become a common way of representing domain knowledge. Compari...
Henry Tan, Tharam S. Dillon, Fedja Hadzic, Elizabe...