Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

120

ICDE
2010
IEEE

255views Database» more ICDE 2010»

On supporting effective web extraction

15 years 11 months ago

On supporting effective web extraction

Download rosaec.snu.ac.kr

— Commercial tuple extraction systems have enjoyed some success to extract tuples by regarding HTML pages as tree structures and exploiting XPath queries to ﬁnd attributes of tuples in the HTML pages. However, such systems would be vulnerable to small changes on the web pages. In this paper, we propose a robust tuple extraction system which utilizes spatial relationships among elements rather than the XPath queries of the elements. Our system regards elements in the rendered page as spatial objects in the 2-D space and executes spatial joins to extract target elements. Since humans also identify an element in a web page by its relative spatial location, our system extracting elements by their spatial relationships could possibly be as robust as manual extraction and is far more robust than existing tuple extraction systems.

Wook-Shin Han, Wooseong Kwak, Hwanjo Yu

Real-time Traffic

Database | ICDE 2010 | Spatial | Tuple Extraction | Tuple Extraction Systems |

claim paper

Related Content

» Web Information Extraction and User Modeling Towards Closing the Gap

» SPARQ2L towards support for subgraph extraction queries in rdf databases

» WebGISRBDL A Rare Book Digital Library Supporting SpatioTemporary Retrieval

» Learning effective ranking functions for newsgroup search

» Extraction and classification of dense communities in the web

» Effective Web data extraction with standard XML technologies

» Supporting Natural Language Processing with Background Knowledge Coreference Resolution Ca...

» COMMIX towards effective web information extraction integration and query answering

» Effective techniques for automatic extraction of Web publications

Post Info
More Details (n/a)

Added	17 May 2010
Updated	17 May 2010
Type	Conference
Year	2010
Where	ICDE
Authors	Wook-Shin Han, Wooseong Kwak, Hwanjo Yu

Comments (0)