A framework is presented for discovering partial duplicates in large collections of scanned books with optical character recognition (OCR) errors. Each book in the collection is r...
Object matching or object consolidation is a crucial task for data integration and data cleaning. It addresses the problem of identifying object instances in data sources referrin...
Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploi...
Rapid growth of internet applications has increased the importance of intrusion detection system (IDS) performance. String matching is the most computation-consuming task in IDS. ...
Duplicate detection determines different representations of realworld objects in a database. Recent research has considered the use of relationships among object representations t...