Sciweavers

VLDB
1999
ACM

Distributed Hypertext Resource Discovery Through Examples

13 years 9 months ago
Distributed Hypertext Resource Discovery Through Examples
We describe the architecture of a hypertext resource discovery system using a relational database. Such a system can answer questions that combine page contents, metadata, and hyperlink structure in powerful ways, such as “find the number of links from an environmental protection page to a page about oil and natural gas over the last year.” A key problem in populating the database in such a system is to discover web resources related to the topics involved in such queries. We argue that that a keywordbased “find similar” search based on a giant all-purpose crawler is neither necessary nor adequate for resource discovery. Instead we exploit the properties that pages tend to cite pages with related topics, and given that a page u cites a page about a desired topic, it is very likely that u cites additional desirable pages. We exploit these properties by using a crawler controlled by two hypertext mining programs: (1) a classifier that evaluates the relevance of a region of th...
Soumen Chakrabarti, Martin van den Berg, Byron Dom
Added 05 Aug 2010
Updated 05 Aug 2010
Type Conference
Year 1999
Where VLDB
Authors Soumen Chakrabarti, Martin van den Berg, Byron Dom
Comments (0)