We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called “GaP” for Gamma-Poisson, the distri...
Despite the widespread use of BM25, there have been few studies examining its effectiveness on a document description over single and multiple field combinations. We determine t...
Maximizing only the relevance between queries and documents will not satisfy users if they want the top search results to present a wide coverage of topics by a few representative...
Yi Liu, Benyu Zhang, Zheng Chen, Michael R. Lyu, W...
Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closel...
Web search is challenging partly due to the fact that search queries and Web documents use different language styles and vocabularies. This paper provides a quantitative analysis ...