Query-related data extraction of hidden web documents

15 years 10 months ago

Download dis.shef.ac.uk

The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dynamically generated through querying databases — which are referred to as Hidden Web databases. Documents returned in response to a user query are typically presented using templategenerated Web pages. This paper proposes a novel approach that identifies Web page templates by analysing the textual contents and the adjacent tag structures of a document in order to extract query-related data. Preliminary results demonstrate that our approach effectively detects templates and retrieves data with high recall and precision. Categories and Subject Descriptors: H.3.5 [Information Storage and Retrieval]: Online Information Services – Web-based services. General Terms: Performance, Experimentation.

Yih-Ling Hedley, Muhammad Younas, Anne E. James, M

Real-time Traffic

General-purpose Search Engines | Hidden Web Databases | SIGIR 2004 | Web Page |

claim paper

» Sampling information extraction and summarisation of Hidden Web databases

» Automatic Hidden Web Database Classification

» Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection

» On the Automatic Extraction of Data from the Hidden Web

» Syntactic Folding and its Application to the Information Extraction from Web Pages

» iCube A ToolSet for the Dynamic Extraction and Integration of Web Data Content

» Extracting unstructured data from template generated web documents

» Bootstrapping Information Extraction from Field Books

Post Info
More Details (n/a)

Added	30 Jun 2010
Updated	30 Jun 2010
Type	Conference
Year	2004
Where	SIGIR
Authors	Yih-Ling Hedley, Muhammad Younas, Anne E. James, Mark Sanderson

Comments (0)

Sciweavers

Query-related data extraction of hidden web documents

General-purpose Search Engines | Hidden Web Databases | SIGIR 2004 | Web Page |

Explore & Download

Productivity Tools

Sciweavers