Sciweavers

ERCIMDL
2010
Springer

Capacity-Constrained Query Formulation

13 years 5 months ago
Capacity-Constrained Query Formulation
Given a set of keyphrases, we analyze how Web queries with these phrases can be formed that, taken altogether, return a specified number of hits. The use case of this problem is a plagiarism detection system that searches the Web for potentially plagiarized passages in a given suspicious document. For the query formulation problem we develop a heuristic search strategy based on cooccurrence probabilities. Compared to the maximal termset strategy [3], which can be considered as the most sensible non-heuristic baseline, our expected savings are on average 50% when queries for 9 or 10 phrases are to be constructed.
Matthias Hagen, Benno Stein
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2010
Where ERCIMDL
Authors Matthias Hagen, Benno Stein
Comments (0)