Sciweavers

85 search results - page 6 / 17
» Extracting unstructured data from template generated web doc...
Sort
View
WWW
2009
ACM
15 years 4 months ago
Bootstrapped extraction of class attributes
As an alternative to previous studies on extracting class attributes from unstructured text, which consider either Web documents or query logs as the source of textual data, A boo...
Joseph Reisinger, Marius Pasca
ICWE
2007
Springer
15 years 3 months ago
Fixing Weakly Annotated Web Data Using Relational Models
In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data – which is typically generated by a (semi) automated information extraction (IE) ...
Fatih Gelgi, Srinivas Vadrevu, Hasan Davulcu
CIKM
2008
Springer
14 years 11 months ago
Mapping enterprise entities to text segments
Today, valuable business information is increasingly stored as unstructured data (documents, emails, etc.). For example, documents exchanged between business partners capture info...
Falk Brauer, Alexander Löser, Hong-Hai Do
ADC
2006
Springer
130views Database» more  ADC 2006»
15 years 3 months ago
A two-phase rule generation and optimization approach for wrapper generation
Web information extraction is a fundamental issue for web information management and integrations. A common approach is to use wrappers to extract data from web pages or documents...
Yanan Hao, Yanchun Zhang
88
Voted
WWW
2009
ACM
15 years 10 months ago
Extracting article text from the web with maximum subsequence segmentation
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
Jeff Pasternack, Dan Roth