Sciweavers

Share
ERCIMDL
2010
Springer

Capacity-Constrained Query Formulation

12 years 1 months ago
Capacity-Constrained Query Formulation
Given a set of keyphrases, we analyze how Web queries with these phrases can be formed that, taken altogether, return a specified number of hits. The use case of this problem is a plagiarism detection system that searches the Web for potentially plagiarized passages in a given suspicious document. For the query formulation problem we develop a heuristic search strategy based on cooccurrence probabilities. Compared to the maximal termset strategy [3], which can be considered as the most sensible non-heuristic baseline, our expected savings are on average 50% when queries for 9 or 10 phrases are to be constructed.
Matthias Hagen, Benno Stein
Added 09 Nov 2010
Updated 09 Nov 2010
Type Conference
Year 2010
Where ERCIMDL
Authors Matthias Hagen, Benno Stein
Comments (0)
books