Random web crawls

10 years 10 months ago
Random web crawls
This paper proposes a random Web crawl model. A Web crawl is a (biased and partial) image of the Web. This paper deals with the hyperlink structure, i.e. a Web crawl is a graph, whose vertices are the pages and whose edges are the hypertextual links. Of course a Web crawl has a very special structure; we recall some known results about it. We then propose a model generating similar structures. Our model simply simulates a crawling, i.e. builds and crawls the graph at the same time. The graphs generated have lot of known properties of Web crawls. Our model is simpler than most random Web graph models, but captures the sames properties. Notice that it models the crawling process instead of the page writing process of Web graph models. Categories and Subject Descriptors I.6.m [Simulation and Modeling]: Miscellaneous General Terms Theory Keywords web graph, crawling, crawl order, model, hyperlink structure
Toufik Bennouas, Fabien de Montgolfier
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2007
Where WWW
Authors Toufik Bennouas, Fabien de Montgolfier
Comments (0)