The impact of collection size on relevance and diversity

11 years 3 months ago
The impact of collection size on relevance and diversity
It has been observed that precision increases with collection size. One explanation could be that the redundancy of information increases, making it easier to find multiple documents conveying the same information. Arguably, a user has no interest in reading the same information over and over, but would prefer a set of diverse search results covering multiple aspects of the search topic. In this paper, we look at the impact of the collection size on the relevance and diversity of retrieval results by down-sampling the collection. Our main finding is that we can we can improve diversity by randomly removing the majority of the results—this will significantly reduce the redundancy and only marginally affect the subtopic coverage. Categories and Subject Descriptors: H.3.4 [Information Storage and Retrieval]: Systems and Software—performance evaluation (efficiency and effectiveness) General Terms: Experimentation, Measurement, Performance
Marijn Koolen, Jaap Kamps
Added 31 Aug 2010
Updated 31 Aug 2010
Type Conference
Year 2010
Authors Marijn Koolen, Jaap Kamps
Comments (0)