Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

94

SIGIR
2000
ACM

favoriteEmaildiscussreport

112views Information Technology» more SIGIR 2000»

Evaluating evaluation measure stability

15 years 4 months ago

Evaluating evaluation measure stability

Download www-lipn.univ-paris13.fr

: This paper presents a novel way of examining the accuracy of the evaluation measures commonly used in information retrieval experiments. It validates several of the rules-of-thumb experimenters use, such as the number of queries needed for a good experiment is at least 25 and 50 is better, while challenging other beliefs, such as the common evaluation measures are equally reliable. As an example, we show that Precision at 30 documents has about twice the average error rate as Average Precision has. These results can help information retrieval researchers design experiments that provide a desired level of confidence in their results. In particular, we suggest researchers usingWeb measures such as Precision at 10 documents will need to use many more than 50 queries or will have to require two methods to have a very large difference in evaluation scores before concluding that the two methods are actually different. ● John Ho Lee Analyses of multiple evidence combination Proceedings of...

Chris Buckley, Ellen M. Voorhees

Real-time Traffic

Evidence Combination | Information Management | Information Retrieval | Multiple Evidence Combination | SIGIR 2000 |

claim paper

Related Content

» A Quantitative Stability Measure for Graspless Manipulation

» Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subse...

» Evaluating Information Content by Factoid Analysis Human annotation and stability

» New resampling method for evaluating stability of clusters

» Mobility and stability evaluation in wireless multihop networks using multiplayer games

» Assessing the Effect of Inconsistent Assessors on Summarization Evaluation

» On Evaluation Methodologies for Text Segmentation Algorithms

» Intrinsic Stabilizers of Planar Curves

» Improving stability for peertopeer multicast overlays by active measurements

Post Info
More Details (n/a)

Added	01 Aug 2010
Updated	01 Aug 2010
Type	Conference
Year	2000
Where	SIGIR
Authors	Chris Buckley, Ellen M. Voorhees

Comments (0)