Sciweavers

502 search results - page 3 / 101
» Extracting Partial Structures from HTML Documents
Sort
View
DKE
2006
139views more  DKE 2006»
13 years 5 months ago
Information extraction from structured documents using k-testable tree automaton inference
Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work on IE from structured documents, suc...
Raymond Kosala, Hendrik Blockeel, Maurice Bruynoog...
SIGIR
2005
ACM
13 years 11 months ago
Title extraction from bodies of HTML documents and its application to web page retrieval
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Yunhua Hu, Guomao Xin, Ruihua Song, Guoping Hu, Sh...
WWW
2005
ACM
14 years 6 months ago
Using visual cues for extraction of tabular data from arbitrary HTML documents
We describe a method to extract tabular data from web pages. Rather than just analyzing the DOM tree, we also exploit visual cues in the rendered version of the document to extrac...
Bernhard Krüpl, Marcus Herzog, Wolfgang Gatte...
XSYM
2005
Springer
107views Database» more  XSYM 2005»
13 years 10 months ago
Logic Wrappers and XSLT Transformations for Tuples Extraction from HTML
Abstract. Recently it was shown that existing general-purpose inductive logic programming systems are useful for learning wrappers (known as L-wrappers) to extract data from HTML d...
Costin Badica, Amelia Badica
APWEB
2003
Springer
13 years 10 months ago
Extracting Content Structure for Web Pages Based on Visual Representation
Abstract. A new web content structure based on visual representation is proposed in this paper. Many web applications such as information retrieval, information extraction and auto...
Deng Cai, Shipeng Yu, Ji-Rong Wen, Wei-Ying Ma