Sciweavers

71 search results - page 1 / 15
» The Case of the Duplicate Documents Measurement, Search, and...
Sort
View
APWEB
2006
Springer
13 years 8 months ago
The Case of the Duplicate Documents Measurement, Search, and Science
Many of the documents in large text collections are duplicates and versions of each other. In recent research, we developed new methods for finding such duplicates; however, as the...
Justin Zobel, Yaniv Bernstein
SIGIR
2004
ACM
13 years 10 months ago
Constructing a text corpus for inexact duplicate detection
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Jack G. Conrad, Cindy P. Schriber
WWW
2007
ACM
14 years 5 months ago
Efficient search engine measurements
We address the problem of measuring global quality metrics of search engines, like corpus size, index freshness, and density of duplicates in the corpus. The recently proposed est...
Ziv Bar-Yossef, Maxim Gurevich
WWW
2009
ACM
14 years 5 months ago
Measuring the similarity between implicit semantic relations from the web
Measuring the similarity between semantic relations that hold among entities is an important and necessary step in various Web related tasks such as relation extraction, informati...
Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuk...
KDD
2006
ACM
185views Data Mining» more  KDD 2006»
14 years 5 months ago
Understanding Content Reuse on the Web: Static and Dynamic Analyses
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar c...
Ricardo A. Baeza-Yates, Álvaro R. Pereira J...