Sciweavers

1109 search results - page 9 / 222
» Crawling on web graphs
Sort
View
95
Voted
WWW
2004
ACM
16 years 2 months ago
Distributed community crawling
The massive distribution of the crawling task can lead to inefficient exploration of the same portion of the Web. We propose a technique to guide crawlers exploration based on the...
Fabrizio Costa, Paolo Frasconi
94
Voted
WWW
2007
ACM
16 years 2 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
JUCS
2008
124views more  JUCS 2008»
15 years 1 months ago
Structure-Based Crawling in the Hidden Web
: The number of applications that need to crawl the Web to gather data is growing at an ever increasing pace. In some cases, the criterion to determine what pages must be included ...
Márcio L. A. Vidal, Altigran Soares da Silv...
WWW
2005
ACM
16 years 2 months ago
User-centric Web crawling
Search engines are the primary gateways of information access on the Web today. Behind the scenes, search engines crawl the Web to populate a local indexed repository of Web pages...
Sandeep Pandey, Christopher Olston
ESWS
2008
Springer
15 years 3 months ago
Instance Based Clustering of Semantic Web Resources
Abstract. The original Semantic Web vision was explicit in the need for intelligent autonomous agents that would represent users and help them navigate the Semantic Web. We argue t...
Gunnar Aastrand Grimnes, Peter Edwards, Alun D. Pr...