Sciweavers

85 search results - page 1 / 17
» Extracting unstructured data from template generated web doc...
Sort
View
CIKM
2003
Springer
13 years 10 months ago
Extracting unstructured data from template generated web documents
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Ling Ma, Nazli Goharian, Abdur Chowdhury, Misun Ch...
CIKM
1998
Springer
13 years 9 months ago
Ontology-Based Extraction and Structuring of Information from Data-Rich Unstructured Documents
We present a new approach to extracting information from unstructured documents based on an application ontology that describes a domain of interest. Starting with such an ontolog...
David W. Embley, Douglas M. Campbell, Randy D. Smi...
AAAI
1997
13 years 6 months ago
Template-Based Information Mining from HTML Documents
Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form...
Jane Yung-jen Hsu, Wen-tau Yih
COMAD
2009
13 years 5 months ago
Business Insight from Collection of Unstructured Formatted Documents with IBM Content Harvester
In this paper, we report the development and experiments of IBM Content Harvester (CH), a tool to analyze and recover templates and content from word processor created text docume...
Biplav Srivastava, Yuan-Chi Chang
SIGIR
2004
ACM
13 years 10 months ago
Query-related data extraction of hidden web documents
The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dyna...
Yih-Ling Hedley, Muhammad Younas, Anne E. James, M...