Sciweavers

STACS
2009
Springer

A Comparison of Techniques for Sampling Web Pages

13 years 11 months ago
A Comparison of Techniques for Sampling Web Pages
As the World Wide Web is growing rapidly, it is getting increasingly challenging to gather representative information about it. Instead of crawling the web exhaustively one has to resort to other techniques like randomly sampling to determine the properties of the web. Unfortunately, no approach has been shown to sample the web pages in an unbiased way. Three promising web sampling algorithms are based on random walks [6, 2, 9]. They each have been evaluated individually, but on different data sets so that a comparison is not possible. In this paper we compare these algorithms by running them on the web with the same computation power and for the same amount of time. We then propose improvements based on experimental results. Keywords URL sampling, Random walks, PageRank, Information gathering from the web.
Eda Baykan, Monika Rauch Henzinger, Stefan F. Kell
Added 20 May 2010
Updated 20 May 2010
Type Conference
Year 2009
Where STACS
Authors Eda Baykan, Monika Rauch Henzinger, Stefan F. Keller, Sebastian De Castelberg, Markus Kinzler
Comments (0)