Sciweavers

SIGIR
2009
ACM

The impact of crawl policy on web search effectiveness

13 years 11 months ago
The impact of crawl policy on web search effectiveness
Crawl selection policy has a direct influence on Web search effectiveness, because a useful page that is not selected for crawling will also be absent from search results. Yet there has been little or no work on measuring this effect. We introduce an evaluation framework, based on relevance judgments pooled from multiple search engines, measuring the maximum potential NDCG that is achievable using a particular crawl. This allows us to evaluate different crawl policies and investigate important scenarios like selection stability over multiple iterations. We conduct two sets of crawling experiments at the scale of 1 billion and 100 million pages respectively. These show that crawl selection based on PageRank, indegree and trans-domain indegree all allow better retrieval effectiveness than a simple breadth-first crawl of the same size. PageRank is the most reliable and effective method. Trans-domain indegree can outperform PageRank, but over multiple crawl iterations it is less effecti...
Dennis Fetterly, Nick Craswell, Vishwa Vinay
Added 28 May 2010
Updated 28 May 2010
Type Conference
Year 2009
Where SIGIR
Authors Dennis Fetterly, Nick Craswell, Vishwa Vinay
Comments (0)