Code clones in software increase maintenance cost and lower software quality. We have devised a new algorithm to detect duplicated parts of source code in large software. Our algo...
Code duplication is an important problem in application maintenance. Tools exist that support code duplication detection. However, few of them propose a solution for the problem, ...
This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm ...
: In this paper, we will propose PC-Filter (PC stands for Partition Comparison), a robust data filter for approximately duplicate record detection in large databases. PC-Filter dis...
Ji Zhang, Tok Wang Ling, Robert M. Bruckner, Han L...
The scalability of graph-search algorithms can be greatly extended by using external memory, such as disk, to store generated nodes. We consider structured duplicate detection, an...