Sciweavers

85 search results - page 11 / 17
» Extracting unstructured data from template generated web doc...
Sort
View
WWW
2001
ACM
15 years 10 months ago
IEPAD: information extraction based on pattern discovery
The research in information extraction (IE) regards the generation of wrappers that can extract particular information from semistructured Web documents. Similar to compiler gener...
Chia-Hui Chang, Shao-Chen Lui
DILS
2009
Springer
15 years 4 months ago
Site-Wide Wrapper Induction for Life Science Deep Web Databases
We present a novel approach to automatic information extraction from Deep Web Life Science databases using wrapper induction. Traditional wrapper induction techniques focus on lear...
Saqib Mir, Steffen Staab, Isabel Rojas
SIGMOD
2010
ACM
201views Database» more  SIGMOD 2010»
14 years 9 months ago
I4E: interactive investigation of iterative information extraction
Information extraction systems are increasingly being used to mine structured information from unstructured text documents. A commonly used unsupervised technique is to build iter...
Anish Das Sarma, Alpa Jain, Divesh Srivastava
85
Voted
WWW
2009
ACM
15 years 10 months ago
Exploiting web search to generate synonyms for entities
Tasks recognizing named entities such as products, people names, or locations from documents have recently received significant attention in the literature. Many solutions to thes...
Surajit Chaudhuri, Venkatesh Ganti, Dong Xin
WEBDB
1999
Springer
196views Database» more  WEBDB 1999»
15 years 1 months ago
Web Ecology: Recycling HTML Pages as XML Documents Using W4F
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to...
Arnaud Sahuguet, Fabien Azavant