Sciweavers

260 search results - page 8 / 52
» Industry-scale duplicate detection
Sort
View
SIGIR
2004
ACM
15 years 3 months ago
Constructing a text corpus for inexact duplicate detection
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Jack G. Conrad, Cindy P. Schriber
61
Voted
PVLDB
2008
99views more  PVLDB 2008»
14 years 9 months ago
Industry-scale duplicate detection
Duplicate detection is the process of identifying multiple representations of a same real-world object in a data source. Duplicate detection is a problem of critical importance in...
Melanie Weis, Felix Naumann, Ulrich Jehle, Jens Lu...
WWW
2005
ACM
15 years 10 months ago
Duplicate detection in click streams
We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solu...
Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi
67
Voted
KDD
2005
ACM
104views Data Mining» more  KDD 2005»
15 years 10 months ago
A hit-miss model for duplicate detection in the WHO drug safety database
The WHO Collaborating Centre for International Drug Monitoring in Uppsala, Sweden, maintains and analyses the world's largest database of reports on suspected adverse drug re...
Andrew Bate, G. Niklas Norén, Roland Orre
92
Voted
ICSE
2008
IEEE-ACM
15 years 10 months ago
An approach to detecting duplicate bug reports using natural language and execution information
An open source project typically maintains an open bug repository so that bug reports from all over the world can be gathered. When a new bug report is submitted to the repository...
Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, Jiasu...