This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focus...
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Vocabulary incompatibilities arise when the terms used to index a document collection are largely unknown, or at least not well-known to the users who eventually search the collec...
James C. French, Allison L. Powell, Fredric C. Gey...
The use of bad names — names that are wrong, inconsistent or inconcise — hinder program comprehension. The root of the problem is that there is no mechanism for aligning the n...
As the most pervasive method of individual identification and document authentication, signatures present convincing evidence and provide an important form of indexing for effectiv...