Sciweavers

3 search results - page 1 / 1
» MapDupReducer: detecting near duplicates over massive datase...
Sort
View
SIGMOD
2010
ACM
269views Database» more  SIGMOD 2010»
13 years 4 months ago
MapDupReducer: detecting near duplicates over massive datasets
Categories and Subject Descriptors General Terms Keywords
Chaokun Wang, Jianmin Wang, Xuemin Lin, Wei Wang, ...
WWW
2008
ACM
14 years 5 months ago
Efficient similarity joins for near duplicate detection
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
KDD
2004
ACM
195views Data Mining» more  KDD 2004»
14 years 4 months ago
Improved robustness of signature-based near-replica detection via lexicon randomization
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...