Sciweavers

84 search results - page 3 / 17
» Managing duplicates in a web archive
Sort
View
ICS
2009
Tsinghua U.
13 years 3 months ago
R-ADMAD: high reliability provision for large-scale de-duplication archival storage systems
Data de-duplication has become a commodity component in dataintensive systems and it is required that these systems provide high reliability comparable to others. Unfortunately, b...
Chuanyi Liu, Yu Gu, Linchun Sun, Bin Yan, Dongshen...
ICMCS
2007
IEEE
149views Multimedia» more  ICMCS 2007»
13 years 11 months ago
SICO: A System for Detection of Near-Duplicate Images During Search
Duplicate and near-duplicate digital image matching is beneficial for image search in terms of collection management, digital content protection, and search efficiency. In this ...
Jun Jie Foo, Ranjan Sinha, Justin Zobel
ICAIL
2007
ACM
13 years 9 months ago
Essential deduplication functions for transactional databases in law firms
As massive document repositories and knowledge management systems continue to expand, in proprietary environments as well as on the Web, the need for duplicate detection becomes i...
Jack G. Conrad, Edward L. Raymond
SIGIR
2004
ACM
13 years 10 months ago
Constructing a text corpus for inexact duplicate detection
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Jack G. Conrad, Cindy P. Schriber
DASFAA
2007
IEEE
143views Database» more  DASFAA 2007»
13 years 11 months ago
Using Redundant Bit Vectors for Near-Duplicate Image Detection
Images are amongst the most widely proliferated form of digital information due to affordable imaging technologies and the Web. In such an environment, the use of digital watermar...
Jun Jie Foo, Ranjan Sinha