We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namel...
We investigate the connection between part of speech (POS) distribution and content in language. We define POS blocks to be groups of parts of speech. We hypothesise that there ex...
Probabilistic retrieval models usually rank documents based on a scalar quantity. However, such models lack any estimate for the uncertainty associated with a document’s rank. Fu...
Jianhan Zhu, Jun Wang, Michael J. Taylor, Ingemar ...
Digital libraries are more and more available on the web. However, retrieving information in these libraries is not easy because of sources heterogeneity and distribution. Thus, w...
Data fusion has been investigated by many researchers in the information retrieval community and has become an effective technique for improving retrieval effectiveness. In this p...