Sciweavers

6 search results - page 1 / 2
» PC-Filter: A Robust Filtering Technique for Duplicate Record...
Sort
View
DEXA
2004
Springer
136views Database» more  DEXA 2004»
13 years 10 months ago
PC-Filter: A Robust Filtering Technique for Duplicate Record Detection in Large Databases
: In this paper, we will propose PC-Filter (PC stands for Partition Comparison), a robust data filter for approximately duplicate record detection in large databases. PC-Filter dis...
Ji Zhang, Tok Wang Ling, Robert M. Bruckner, Han L...
TCSV
2010
12 years 11 months ago
Efficient and Robust Detection of Duplicate Videos in a Large Database
We present an efficient and accurate method for duplicate video detection in a large database using video fingerprints. We have empirically chosen the Color Layout Descriptor, a c...
Anindya Sarkar, Vishwakarma Singh, Pratim Ghosh, B...
KDD
2004
ACM
195views Data Mining» more  KDD 2004»
14 years 5 months ago
Improved robustness of signature-based near-replica detection via lexicon randomization
Detection of near duplicate documents is an important problem in many data mining and information filtering applications. When faced with massive quantities of data, traditional d...
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto...
CIKM
2009
Springer
13 years 8 months ago
Robust record linkage blocking using suffix arrays
Record linkage is an important data integration task that has many practical uses for matching, merging and duplicate removal in large and diverse databases. However, a quadratic ...
Timothy de Vries, Hui Ke, Sanjay Chawla, Peter Chr...
VLDB
2007
ACM
137views Database» more  VLDB 2007»
13 years 11 months ago
Detecting Attribute Dependencies from Query Feedback
Real-world datasets exhibit a complex dependency structure among the data attributes. Learning this structure is a key task in automatic statistics configuration for query optimi...
Peter J. Haas, Fabian Hueske, Volker Markl