Sciweavers

165 search results - page 6 / 33
» Distributed Indexing of the Web Using Migrating Crawlers
Sort
View
WWW
2004
ACM
15 years 10 months ago
Distributed community crawling
The massive distribution of the crawling task can lead to inefficient exploration of the same portion of the Web. We propose a technique to guide crawlers exploration based on the...
Fabrizio Costa, Paolo Frasconi
AIRWEB
2007
Springer
15 years 3 months ago
A Taxonomy of JavaScript Redirection Spam
Redirection spam presents a web page with false content to a crawler for indexing, but automatically redirects the browser to a different web page. Redirection is usually immediat...
Kumar Chellapilla, Alexey Maykov
WEBDB
2007
Springer
159views Database» more  WEBDB 2007»
15 years 3 months ago
A clustering-based sampling approach for refreshing search engine's database
Due to resource constraints, search engines usually have difficulties keeping the local database completely synchronized with the Web. To detect as many changes as possible, the ...
Qingzhao Tan, Ziming Zhuang, Prasenjit Mitra, C. L...
WWW
2007
ACM
15 years 10 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
WWW
2008
ACM
15 years 10 months ago
Investigating web services on the world wide web
Searching for Web service access points is no longer attached to service registries as Web search engines have become a new major source for discovering Web services. In this work...
Eyhab Al-Masri, Qusay H. Mahmoud