Improving Retrievability and Recall by Automatic Corpus Partitioning

9 years 2 months ago
Improving Retrievability and Recall by Automatic Corpus Partitioning
Abstract. With increasing volumes of data, much effort has been devoted to finding the most suitable answer to an information need. However, in many domains, the question whether any specific information item can be found at all via a reasonable set of queries is essential. This concept of Retrievability of information has evolved into an important evaluation measure of IR systems in recall-oriented application domains. While several studies evaluated retrieval bias in systems, solid validation of the impact of retrieval bias and the development of methods to counter low retrievability of certain document types would be desirable. This paper provides an in-depth study of retrievability characteristics over queries of different length in a large benchmark corpus, validating previous studies. It analyzes the possibility of automatically categorizing documents into low and high retrievable documents based on document properties rather than complex retrievability analysis. We furthermore s...
Shariq Bashir, Andreas Rauber
Added 22 May 2011
Updated 22 May 2011
Type Journal
Year 2010
Authors Shariq Bashir, Andreas Rauber
Comments (0)