Sciweavers

CN
1999

Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery

13 years 4 months ago
Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using keywords, but using exemplary documents. Rather than collecting and indexing all accessible Web documents to be able to answer all possible ad-hoc queries, a focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoids irrelevant regions of the Web. This leads to significant savings in hardware and network resources, and helps keep the crawl more up-to-date. To achieve such goal-directed crawling, we designed two hypertext mining programs that guide our crawler: a classifier that evaluates the relevance of a hypertext document with respect to the focus topics, ...
Soumen Chakrabarti, Martin van den Berg, Byron Dom
Added 22 Dec 2010
Updated 22 Dec 2010
Type Journal
Year 1999
Where CN
Authors Soumen Chakrabarti, Martin van den Berg, Byron Dom
Comments (0)