Sciweavers

609 search results - page 10 / 122
» Adaptive record extraction from web pages
Sort
View
WWW
2003
ACM
16 years 11 days ago
Annotating Web pages for the needs of Web Information Extraction Applications
This paper outlines our approach to the creation of annotated corpora for the purposes of Web Information Extraction, and presents the Web Annotation tool. This tool enables the a...
Georgios Sigletos, Dimitra Farmakiotou, Konstantin...
BMCBI
2011
14 years 3 months ago
Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library
Background: The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, an...
Roderic D. M. Page
PAMI
2007
107views more  PAMI 2007»
14 years 11 months ago
Recognition of Pornographic Web Pages by Classifying Texts and Images
—With the rapid development of the World Wide Web, people benefit more and more from the sharing of information. However, Web pages with obscene, harmful, or illegal content can ...
Weiming Hu, Ou Wu, Zhouyao Chen, Zhouyu Fu, Stephe...
ICDE
2006
IEEE
124views Database» more  ICDE 2006»
16 years 1 months ago
Segmentation of Publication Records of Authors from the Web
Publication records are often found in the authors' personal home pages. If such a record is partitioned into a list of semantic fields of authors, title, date, etc., the uns...
Wei Zhang, Clement T. Yu, Neil R. Smalheiser, Vetl...
CIKM
2005
Springer
15 years 5 months ago
ViPER: augmenting automatic information extraction with visual perceptions
In this paper we address the problem of unsupervised Web data extraction. We show that unsupervised Web data extraction becomes feasible when supposing pages that are made up of r...
Kai Simon, Georg Lausen