Outlink estimation for pagerank computation under missing data

14 years 7 months ago
Outlink estimation for pagerank computation under missing data
The enormity and rapid growth of the web-graph forces quantities such as its pagerank to be computed under missing information consisting of outlinks of pages that have not yet been crawled. This paper examines the role played by the size and distribution of this missing data in determining the accuracy of the computed pagerank, focusing on questions such as (i) the accuracy of pageranks under missing information, (ii) the size at which a crawl process may be aborted while still ensuring reasonable accuracy of pageranks, and (iii) algorithms to estimate pageranks under such missing information. The first couple of questions are addressed on the basis of certain simple bounds relating the expected distance between the true and computed pageranks and the size of the missing data. The third question is explored by devising algorithms to predict the pageranks when full information is not available. A key feature of the "dangling link estimation" and "clustered link estimati...
Sreangsu Acharyya, Joydeep Ghosh
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2004
Where WWW
Authors Sreangsu Acharyya, Joydeep Ghosh
Comments (0)