Sciweavers

43 search results - page 2 / 9
» Efficient similarity joins for near duplicate detection
Sort
View
SSDBM
2010
IEEE
220views Database» more  SSDBM 2010»
13 years 9 months ago
Prefix Tree Indexing for Similarity Search and Similarity Joins on Genomic Data
Similarity search and similarity join on strings are important for applications such as duplicate detection, error detection, data cleansing, or comparison of biological sequences....
Astrid Rheinländer, Martin Knobloch, Nicky Ho...
ICPR
2010
IEEE
13 years 2 months ago
Beyond "Near Duplicates": Learning Hash Codes for Efficient Similar-Image Retrieval
Finding similar images in a large database is an important, but often computationally expensive, task. In this paper, we present a two-tier similar-image retrieval system with the...
Shumeet Baluja, Michele Covell
ICPR
2010
IEEE
13 years 8 months ago
Beyond "Near-Duplicates": Learning Hash Codes for Efficient Similar-Image Retrieval
Finding similar images in a large database is an important, but often computationally expensive, task. In this paper, we present a two-tier similar-image retrieval system with the...
Shumeet Baluja, Michele Covell
WWW
2004
ACM
14 years 6 months ago
Web data integration using approximate string join
Web data integration is an important preprocessing step for web mining. It is highly likely that several records on the web whose textual representations differ may represent the ...
Yingping Huang, Gregory R. Madey
GIS
2010
ACM
13 years 4 months ago
Detecting nearly duplicated records in location datasets
The quality of a local search engine, such as Google and Bing Maps, heavily relies on its geographic datasets. Typically, these datasets are obtained from multiple sources, e.g., ...
Yu Zheng, Xixuan Fen, Xing Xie, Shuang Peng, James...