Sciweavers

178 search results - page 3 / 36
» Scheduling Algorithms for Web Crawling
Sort
View
WWW
2008
ACM
14 years 6 months ago
Recrawl scheduling based on information longevity
It is crucial for a web crawler to distinguish between ephemeral and persistent content. Ephemeral content (e.g., quote of the day) is usually not worth crawling, because by the t...
Christopher Olston, Sandeep Pandey
WWW
2003
ACM
14 years 6 months ago
Adaptive on-line page importance computation
The computation of page importance in a huge dynamic graph has recently attracted a lot of attention because of the web. Page importance, or page rank is defined as the fixpoint o...
Serge Abiteboul, Mihai Preda, Gregory Cobena
DMIN
2007
183views Data Mining» more  DMIN 2007»
13 years 7 months ago
Crawling Attacks Against Web-based Recommender Systems
—User profiles derived from Web navigation data are used in important e-commerce applications such as Web personalization, recommender systems, and Web analytics. In the open en...
Runa Bhaumik, Robin D. Burke, Bamshad Mobasher
WWW
2009
ACM
14 years 6 months ago
Sitemaps: above and beyond the crawl of duty
Comprehensive coverage of the public web is crucial to web search engines. Search engines use crawlers to retrieve pages and then discover new ones by extracting the pages' o...
Uri Schonfeld, Narayanan Shivakumar
ICDE
2007
IEEE
167views Database» more  ICDE 2007»
14 years 7 months ago
DSphere: A Source-Centric Approach to Crawling, Indexing and Searching the World Wide Web
We describe DSPHERE1 - a decentralized system for crawling, indexing, searching and ranking of documents in the World Wide Web. Unlike most of the existing search technologies tha...
Bhuvan Bamba, Ling Liu, James Caverlee, Vaibhav Pa...