Sciweavers

WWW
2009
ACM

Graph based crawler seed selection

14 years 5 months ago
Graph based crawler seed selection
This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper seeds can increase the number of pages a crawler will discover, and can result in a collection with more "good" and less "bad" pages. Based on the analysis of the graph structure of the web, we propose several seed selection algorithms. Effectiveness of these algorithms is proved by our experimental results on real web data. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms Algorithms, Design, Experimentation, Performance Keywords Crawler, Seed Selection, PageRank, Graph Analysis
Shuyi Zheng, Pavel Dmitriev, C. Lee Giles
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2009
Where WWW
Authors Shuyi Zheng, Pavel Dmitriev, C. Lee Giles
Comments (0)