There are a number of established products on the market for wrapping—semi-automatic navigation and extraction of data—from web pages. These solutions make use of the inherent...
One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscal...
Cloaking is a search engine spamming technique used by some Web sites to deliver one page to a search engine for indexing while serving an entirely different page to users browsin...
This paper presents the design and user evaluation of SmartBack, a feature that complements the standard Back button by enabling users to jump directly to key pages in their navig...
There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this...
Ravi Kumar, Kunal Punera, Torsten Suel, Sergei Vas...