Sciweavers

103 search results - page 3 / 21
» Models and Algorithms for Duplicate Document Detection
Sort
View
ICAIL
2007
ACM
13 years 9 months ago
Essential deduplication functions for transactional databases in law firms
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
Jack G. Conrad, Edward L. Raymond
GIS
2010
ACM
13 years 4 months ago
Detecting nearly duplicated records in location datasets
The quality of a local search engine, such as Google and Bing Maps, heavily relies on its geographic datasets. Typically, these datasets are obtained from multiple sources, e.g., ...
Yu Zheng, Xixuan Fen, Xing Xie, Shuang Peng, James...
DAS
1998
Springer
13 years 10 months ago
Group 4 Compressed Document Matching
Numerous approaches, including textual, structural and featural, to detecting duplicate documents have been investigated. Considering document images are usually stored and transm...
Dar-Shyang Lee, Jonathan J. Hull
KDD
2004
ACM
195views Data Mining» more  KDD 2004»
14 years 6 months ago
Improved robustness of signature-based near-replica detection via lexicon randomization
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
RECOMB
2006
Springer
14 years 6 months ago
Evolution of Tandemly Repeated Sequences Through Duplication and Inversion
Abstract. Given a phylogenetic tree T for a family of tandemly repeated genes and their signed order O on the chromosome, we aim to find the minimum number of inversions compatible...
Denis Bertrand, Mathieu Lajoie, Nadia El-Mabrouk, ...