Sciweavers

SIGIR
2004
ACM

Corpus structure, language models, and ad hoc information retrieval

13 years 10 months ago
Corpus structure, language models, and ad hoc information retrieval
Most previous work on the recently developed languagemodeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into account the structure of the surrounding corpus. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in precision and recall, and our new interpolation algorithm posts statistically significant improvements for both metrics over all three corpora tested. Categories and Subject Descriptors H3.3 [Information Search and Retrieval]: Language models, clustering, smoothing General Terms Algorithms, Experiments Keywords language modeling, aspect models, interpolation model, clustering, smoothing, cluster-based language models
Oren Kurland, Lillian Lee
Added 30 Jun 2010
Updated 30 Jun 2010
Type Conference
Year 2004
Where SIGIR
Authors Oren Kurland, Lillian Lee
Comments (0)