Duplicate detection determines different representations of realworld objects in a database. Recent research has considered the use of relationships among object representations t...
Many data are modeled as tensors, or multi dimensional arrays. Examples include the predicates (subject, verb, object) in knowledge bases, hyperlinks and anchor texts in the Web g...
U. Kang, Evangelos E. Papalexakis, Abhay Harpale, ...
Duplicate elimination is an important stage in integrating data from multiple sources. The challenges involved are finding a robust deduplication function that can identify when t...
Code clones in software increase maintenance cost and lower software quality. We have devised a new algorithm to detect duplicated parts of source code in large software. Our algo...
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...