Sciweavers

SIGIR
2006
ACM

Minimal test collections for retrieval evaluation

13 years 10 months ago
Minimal test collections for retrieval evaluation
Accurate estimation of information retrieval evaluation metrics such as average precision require large sets of relevance judgments. Building sets large enough for evaluation of realworld implementations is at best inefficient, at worst infeasible. In this work we link evaluation with test collection construction to gain an understanding of the minimal judging effort that must be done to have high confidence in the outcome of an evaluation. A new way of looking at average precision leads to a natural algorithm for selecting documents to judge and allows us to estimate the degree of confidence by defining a distribution over possible document judgments. A study with annotators shows that this method can be used by a small group of researchers to rank a set of systems in under three hours with 95% confidence. Categories and Subject Descriptors: H.3 Information Storage and Retrieval; H.3.4 Systems and Software: Performance Evaluation General Terms: Algorithms, Measurement, Experimen...
Ben Carterette, James Allan, Ramesh K. Sitaraman
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where SIGIR
Authors Ben Carterette, James Allan, Ramesh K. Sitaraman
Comments (0)