Sciweavers

85 search results - page 2 / 17
» Extracting unstructured data from template generated web doc...
Sort
View
ICDM
2007
IEEE
476views Data Mining» more  ICDM 2007»
13 years 11 months ago
FiVaTech: Page-Level Web Data Extraction from Template Pages
In this paper, we proposed a new approach, called FiVaTech for the problem of Web data extraction. FiVaTech is a page-level data extraction system which deduces the data schema an...
Mohammed Kayed, Chia-Hui Chang, Khaled F. Shaalan,...
CASCON
2007
112views Education» more  CASCON 2007»
13 years 6 months ago
Removing manually generated boilerplate from electronic texts: experiments with project Gutenberg e-books
Collaborative work on unstructured or semistructured documents, such as in literature corpora or source code, often involves agreed upon templates containing metadata. These templ...
Owen Kaser, Daniel Lemire
DOCENG
2007
ACM
13 years 8 months ago
Extracting reusable document components for variable data printing
Variable Data Printing (VDP) has brought new flexibility and dynamism to the printed page. Each printed instance of a specific class of document can now have different degrees of ...
Steven R. Bagley, David F. Brailsford, James A. Ol...
CICLING
2009
Springer
14 years 5 months ago
Business Specific Online Information Extraction from German Websites
This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifi...
Yeong Su Lee, Michaela Geierhos
SIGMOD
2003
ACM
190views Database» more  SIGMOD 2003»
13 years 10 months ago
Extracting Structured Data from Web Pages
Many web sites contain large sets of pages generated using a common template or layout. For example, Amazon lays out the author, title, comments, etc. in the same way in all its b...
Arvind Arasu, Hector Garcia-Molina