Sciweavers

611 search results - page 2 / 123
» Random web crawls
Sort
View
WWW
2003
ACM
14 years 5 months ago
Efficient URL caching for world wide web crawling
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...
Andrei Z. Broder, Marc Najork, Janet L. Wiener
INTR
2002
50views more  INTR 2002»
13 years 4 months ago
Methodologies for crawler based Web surveys
There have been many attempts to study the content of the web, either through human or automatic agents. Five different previously used web survey methodologies are described and ...
Mike Thelwall
ADBIS
2003
Springer
173views Database» more  ADBIS 2003»
13 years 9 months ago
UCYMICRA: Distributed Indexing of the Web Using Migrating Crawlers
Due to the tremendous increase rate and the high change frequency of Web documents, maintaining an up-to-date index for searching purposes (search engines) is becoming a challenge....
Odysseas Papapetrou, Stavros Papastavrou, George S...
WWW
2008
ACM
14 years 5 months ago
iRobot: an intelligent crawler for web forums
We study in this paper the Web forum crawling problem, which is a very fundamental step in many Web applications, such as search engine and Web data mining. As a typical user-crea...
Rui Cai, Jiang-Ming Yang, Wei Lai, Yida Wang, Lei ...
WWW
2004
ACM
14 years 5 months ago
Distributed location aware web crawling
Distributed crawling has shown that it can overcome important limitations of the today's crawling paradigm. However, the optimal benefits of this approach are usually limited...
Odysseas Papapetrou, George Samaras