Minimal test collections for retrieval evaluation

15 years 5 months ago

Download ciir.cs.umass.edu

Accurate estimation of information retrieval evaluation metrics such as average precision require large sets of relevance judgments. Building sets large enough for evaluation of realworld implementations is at best ineﬃcient, at worst infeasible. In this work we link evaluation with test collection construction to gain an understanding of the minimal judging eﬀort that must be done to have high conﬁdence in the outcome of an evaluation. A new way of looking at average precision leads to a natural algorithm for selecting documents to judge and allows us to estimate the degree of conﬁdence by deﬁning a distribution over possible document judgments. A study with annotators shows that this method can be used by a small group of researchers to rank a set of systems in under three hours with 95% conﬁdence. Categories and Subject Descriptors: H.3 Information Storage and Retrieval; H.3.4 Systems and Software: Performance Evaluation General Terms: Algorithms, Measurement, Experimen...

Ben Carterette, James Allan, Ramesh K. Sitaraman

Real-time Traffic