Sciweavers

563 search results - page 69 / 113
» Crawling the web for structured documents
Sort
View
SAINT
2005
IEEE
15 years 7 months ago
Learning Logic Wrappers for Information Extraction from the Web
This paper discusses a methodology for applying general-purpose first-order inductive learning to extract information from Web documents structured as unranked ordered trees. The...
Costin Badica, Elvira Popescu, Amelia Badica
HT
1996
ACM
15 years 6 months ago
HyPursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering
HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search activities. Our content-link...
Ron Weiss, Bienvenido Vélez, Mark A. Sheldo...
ADC
2006
Springer
130views Database» more  ADC 2006»
15 years 8 months ago
A two-phase rule generation and optimization approach for wrapper generation
Web information extraction is a fundamental issue for web information management and integrations. A common approach is to use wrappers to extract data from web pages or documents...
Yanan Hao, Yanchun Zhang
WEBI
2005
Springer
15 years 7 months ago
Automated Metadata and Instance Extraction from News Web Sites
In this paper, we present automated techniques for extracting metadata instance information by organizing and mining a set of news Web sites. We develop algorithms that detect and...
Srinivas Vadrevu, Saravanakumar Nagarajan, Fatih G...
CIKM
2008
Springer
15 years 3 months ago
Dr. Searcher and Mr. Browser: a unified hyperlink-click graph
We introduce a unified graph representation of the Web, which includes both structural and usage information. We model this graph using a simple union of the Web's hyperlink ...
Barbara Poblete, Carlos Castillo, Aristides Gionis