Query-related data extraction of hidden web documents

15 years 6 months ago

Download dis.shef.ac.uk

The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dynamically generated through querying databases — which are referred to as Hidden Web databases. Documents returned in response to a user query are typically presented using templategenerated Web pages. This paper proposes a novel approach that identifies Web page templates by analysing the textual contents and the adjacent tag structures of a document in order to extract query-related data. Preliminary results demonstrate that our approach effectively detects templates and retrieves data with high recall and precision. Categories and Subject Descriptors: H.3.5 [Information Storage and Retrieval]: Online Information Services – Web-based services. General Terms: Performance, Experimentation.

Yih-Ling Hedley, Muhammad Younas, Anne E. James, M

Real-time Traffic

General-purpose Search Engines | Hidden Web Databases | SIGIR 2004 | Web Page |

claim paper

» Sampling information extraction and summarisation of Hidden Web databases

» Automatic Hidden Web Database Classification

» Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection

» On the Automatic Extraction of Data from the Hidden Web

» Syntactic Folding and its Application to the Information Extraction from Web Pages

» iCube A ToolSet for the Dynamic Extraction and Integration of Web Data Content

» Extracting unstructured data from template generated web documents

» Bootstrapping Information Extraction from Field Books

Post Info
More Details (n/a)

Added	30 Jun 2010
Updated	30 Jun 2010
Type	Conference
Year	2004
Where	SIGIR
Authors	Yih-Ling Hedley, Muhammad Younas, Anne E. James, Mark Sanderson

Comments (0)

Sciweavers

Query-related data extraction of hidden web documents

General-purpose Search Engines | Hidden Web Databases | SIGIR 2004 | Web Page |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers