Sampling, information extraction and summarisation of Hidden Web databases

15 years 15 days ago

Download dis.shef.ac.uk

Hidden Web databases maintain a collection of specialised documents, which are dynamically generated in response to users' queries. The majority of these documents are generated through Web page templates, which contain information that is often irrelevant to queries. In this paper, we present a system designed to detect and extract query-related information from documents sampled from databases. The proposed system, 2PS, is based on a two-phase framework for the sampling, extraction and summarisation of Hidden Web documents. In the first phase, 2PS queries databases with random terms selected from those contained in their search interface pages and the subsequently retrieved documents

Yih-Ling Hedley, Muhammad Younas, Anne E. James, M

Real-time Traffic

DKE 2006 | Documents | Web Databases | Web Page |

claim paper

» HDSampler revealing data behind web form interfaces

» A random walk approach to sampling hidden databases

» Organizing HiddenWeb Databases by Clustering Visible Web Documents

» Crawling the Hidden Web

» Selecting actions for resourcebounded information extraction using reinforcement learning

» OntheFly Integration and Ad Hoc Querying of Life Sciences Databases Using LifeDB

» Mining search engine query logs via suggestion sampling

Post Info
More Details (n/a)

Added	11 Dec 2010
Updated	11 Dec 2010
Type	Journal
Year	2006
Where	DKE
Authors	Yih-Ling Hedley, Muhammad Younas, Anne E. James, Mark Sanderson

Comments (0)

Sciweavers

Sampling, information extraction and summarisation of Hidden Web databases

DKE 2006 | Documents | Web Databases | Web Page |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers