Sciweavers

FLAIRS
2004

Towards a Universal Web Wrapper

13 years 5 months ago
Towards a Universal Web Wrapper
The wealth of information contained in the world-wide web has created much interest in systems for integrating information from multiple sites. We describe a universal wrapper machine that can learn to extract information from the web given only a set of general rules describing the data domain. It cleanly separates out site-independent and site-specific knowledge from execution implementation. Site-independent knowledge is expressed in user-supplied domain rules, while site-specific knowledge is expressed in automatically-generated context-free grammars that describe site structures. The two are combined by using the domain rules to semantically interpret the parse trees generated by the grammars. The resulting declarative wrapper specifications are easily understandable by humans and can be executed to perform information extraction. Once extracted, tuples can be queried by external agents using a high-level agent communication language.
Theodore W. Hong, Keith L. Clark
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2004
Where FLAIRS
Authors Theodore W. Hong, Keith L. Clark
Comments (0)