Sciweavers

SIGIR
2009
ACM

Approximating true relevance distribution from a mixture model based on irrelevance data

13 years 10 months ago
Approximating true relevance distribution from a mixture model based on irrelevance data
Pseudo relevance feedback (PRF), which has been widely applied in IR, aims to derive a distribution from the top n pseudo relevant documents D. However, these documents are often a mixture of relevant and irrelevant documents. As a result, the derived distribution is actually a mixture model, which has long been limiting the performance of PRF. This is particularly the case when we deal with difficult queries where the truly relevant documents in D are very sparse. In this situation, it is often easier to identify a small number of seed irrelevant documents, which can form a seed irrelevant distribution. Then, a fundamental and challenging problem arises: solely based on the mixed distribution and a seed irrelevance distribution, how to automatically generate an optimal approximation of the true relevance distribution? In this paper, we propose a novel distribution separation model (DSM) to tackle this problem. Theoretical justifications of the proposed algorithm are given. Evaluatio...
Peng Zhang, Yuexian Hou, Dawei Song
Added 28 May 2010
Updated 28 May 2010
Type Conference
Year 2009
Where SIGIR
Authors Peng Zhang, Yuexian Hou, Dawei Song
Comments (0)