Wikipedia is one of the most popular information sources on the Web. The free encyclopedia is densely linked. The link structure in Wikipedia differs from the Web at large: interna...
The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a pa...
Many web links mislead human surfers and automated crawlers because they point to changed content, out-of-date information, or invalid URLs. It is a particular problem for large, ...
Link analysis is a key technology in contemporary web search engines. Most of the previous work on link analysis only used information from one snapshot of web graph. Since commer...
Lei Yang, Lei Qi, Yan-Ping Zhao, Bin Gao, Tie-Yan ...
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...