Statistical precision of information retrieval evaluation

13 years 10 months ago

Download plg.uwaterloo.ca

We introduce and validate bootstrap techniques to compute conﬁdence intervals that quantify the eﬀect of test-collection variability on average precision (AP) and mean average precision (MAP) IR eﬀectiveness measures. We consider the test collection in IR evaluation to be a representative of a population of materially similar collections, whose documents are drawn from an inﬁnite pool with similar characteristics. Our model accurately predicts the degree of concordance between system results on randomly selected halves of the TREC-6 ad hoc corpus. We advance a framework for statistical evaluation that uses the same general framework to model other sources of chance variation as a source of input for meta-analysis techniques. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Systems and Software – performance evaluation General Terms Experimentation, Measurement Keywords bootstrap, conﬁdence interval, precision

Gordon V. Cormack, Thomas R. Lynam

Real-time Traffic