We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
We analyze the recent phenomenon termed a Link Bomb, and investigate the optimal attack pattern for a group of web pages attempting to link bomb a specific web page. The typical ...
The quality of document content, which is an issue that is usually ignored for the traditional ad hoc retrieval task, is a critical issue for Web search. Web pages have a huge var...
In this paper we show how Constraint Programming (CP) techniques can improve the efficiency and applicability of grid-based algorithms for optimising surface contact between comple...
Forming test collection relevance judgments from the pooled output of multiple retrieval systems has become the standard process for creating resources such as the TREC, CLEF, and...