Sciweavers

SIGIR
2006
ACM

A statistical method for system evaluation using incomplete judgments

13 years 10 months ago
A statistical method for system evaluation using incomplete judgments
We consider the problem of large-scale retrieval evaluation, and we propose a statistical method for evaluating retrieval systems using incomplete judgments. Unlike existing techniques that (1) rely on effectively complete, and thus prohibitively expensive, relevance judgment sets, (2) produce biased estimates of standard performance measures, or (3) produce estimates of non-standard measures thought to be correlated with these standard measures, our proposed statistical technique produces unbiased estimates of the standard measures themselves. Our proposed technique is based on random sampling. While our estimates are unbiased by statistical design, their variance is dependent on the sampling distribution employed; as such, we derive a sampling distribution likely to yield low variance estimates. We test our proposed technique using benchmark TREC data, demonstrating that a sampling pool derived from a set of runs can be used to efficiently and effectively evaluate those runs. We fu...
Javed A. Aslam, Virgiliu Pavlu, Emine Yilmaz
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where SIGIR
Authors Javed A. Aslam, Virgiliu Pavlu, Emine Yilmaz
Comments (0)