Sciweavers

WWW
2001
ACM

Intelligent crawling on the World Wide Web with arbitrary predicates

14 years 4 months ago
Intelligent crawling on the World Wide Web with arbitrary predicates
The enormous growth of the world wide web in recent years has made it important to perform resource discovery e ciently. Consequently, several new ideas have been proposed in recent years among thema key technique is focused crawling which is able to crawl particular topical portions of the world wide web quickly without having to explore all web pages. In this paper, we propose the novel concept of intelligent crawling which actually learns characteristics of the linkage structure of the world wide web while performing the crawling. Speci cally, the intelligent crawler uses the inlinking web page content, candidate URL structure, or other behaviors of the inlinking web pages or siblings in order to estimate the probability that a candidate is useful for a given crawl. This is a much more general framework than the focused crawling technique which is based on a pre-de ned understanding of the topical structure of the web. The techniques discussed in this paper are applicable for crawl...
Charu C. Aggarwal, Fatima Al-Garawi, Philip S. Yu
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2001
Where WWW
Authors Charu C. Aggarwal, Fatima Al-Garawi, Philip S. Yu
Comments (0)