Evaluating Performance and Quality of XML-Based Similarity Joins

10 years 9 months ago
Evaluating Performance and Quality of XML-Based Similarity Joins
A similarity join correlating fragments in XML documents, which are similar in structure and content, can be used as the core algorithm to support data cleaning and data integration tasks. For this reason, built-in support for such an operator in an XML database management system (XDBMS) is very attractive. However, similarity assessment is especially difficult on XML datasets, because structure, besides textual information, may embody variations in XML documents representing the same real-world entity. Moreover, the similarity computation is considerably more expensive for tree-structured objects and should, therefore, be a prime optimization candidate. In this paper, we explore and optimize tree-based similarity joins and analyze their performance and accuracy when embedded in native XDBMSs.
Leonardo Ribeiro, Theo Härder
Added 01 Jun 2010
Updated 01 Jun 2010
Type Conference
Year 2008
Authors Leonardo Ribeiro, Theo Härder
Comments (0)