Sciweavers

IR
2008

An analysis on document length retrieval trends in language modeling smoothing

13 years 4 months ago
An analysis on document length retrieval trends in language modeling smoothing
Abstract. Document length is widely recognized as an important factor for adjusting retrieval systems. Many models tend to favor the retrieval of either short or long documents and, thus, a length-based correction needs to be applied for avoiding any length bias. In Language Modeling for Information Retrieval, smoothing methods are applied to move probability mass from document terms to unseen words, which is often dependant upon document length. In this paper, we perform an in-depth study of this behavior, characterized by the document length retrieval trends, of three popular smoothing methods across a number of factors, and its impact on the length of documents retrieved and retrieval performance. First, we theoretically analyze the Jelinek-Mercer, Dirichlet prior and two-stage smoothing strategies and, then, conduct an empirical analysis. In our analysis we show how Dirichlet prior smoothing caters for document length more appropriately than Jelinek-Mercer smoothing which leads to ...
David E. Losada, Leif Azzopardi
Added 12 Dec 2010
Updated 12 Dec 2010
Type Journal
Year 2008
Where IR
Authors David E. Losada, Leif Azzopardi
Comments (0)