Sciweavers

24 search results - page 1 / 5
» Detecting nearly duplicated records in location datasets
Sort
View
GIS
2010
ACM
13 years 2 months ago
Detecting nearly duplicated records in location datasets
The quality of a local search engine, such as Google and Bing Maps, heavily relies on its geographic datasets. Typically, these datasets are obtained from multiple sources, e.g., ...
Yu Zheng, Xixuan Fen, Xing Xie, Shuang Peng, James...
WWW
2008
ACM
14 years 4 months ago
Efficient similarity joins for near duplicate detection
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
SIGMOD
2010
ACM
269views Database» more  SIGMOD 2010»
13 years 3 months ago
MapDupReducer: detecting near duplicates over massive datasets
Categories and Subject Descriptors General Terms Keywords
Chaokun Wang, Jianmin Wang, Xuemin Lin, Wei Wang, ...
P2P
2010
IEEE
202views Communications» more  P2P 2010»
13 years 2 months ago
Optimizing Near Duplicate Detection for P2P Networks
—In this paper, we propose a probabilistic algorithm for detecting near duplicate text, audio, and video resources efficiently and effectively in large-scale P2P systems. To thi...
Odysseas Papapetrou, Sukriti Ramesh, Stefan Siersd...
KDD
2003
ACM
214views Data Mining» more  KDD 2003»
14 years 4 months ago
Adaptive duplicate detection using learnable string similarity measures
The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied...
Mikhail Bilenko, Raymond J. Mooney