There exist two types of wrappers: the string based wrapper such as the LR wrapper, and the tree based wrapper. A tree based wrapper designates extraction regions by nodes on the ...
Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
Research on information extraction from Web pages (wrapping) has seen much activity in recent times (particularly systems implementations), but little work has been done on formal...
This paper is concerned with the problem of structured data extraction from Web pages. The objective of the research is to automatically segment data records in a page, extract da...
We present the Lixto project, which is both a research project in database theory and a commercial enterprise that develops Web data extraction (wrapping) and Web service definiti...
Georg Gottlob, Christoph Koch, Robert Baumgartner,...