Efficient search engine measurements

13 years 6 months ago
Efficient search engine measurements
We address the problem of measuring global quality metrics of search engines, like corpus size, index freshness, and density of duplicates in the corpus. The recently proposed estimators for such metrics [2, 6] suffer from significant bias and/or poor performance, due to inaccurate approximation of the so called "document degrees". We present two new estimators that are able to overcome the bias introduced by approximate degrees. Our estimators are based on a careful implementation of an approximate importance sampling procedure. Comprehensive theoretical and empirical analysis of the estimators demonstrates that they have essentially no bias even in situations where document degrees are poorly approximated. Building on an idea from [6], we discuss Rao Blackwellization as a generic method for reducing variance in search engine estimators. We show that Rao-Blackwellizing our estimators results in significant performance improvements, while not compromising accuracy. Categorie...
Ziv Bar-Yossef, Maxim Gurevich
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2007
Where WWW
Authors Ziv Bar-Yossef, Maxim Gurevich
Comments (0)