Wrapper induction: Efficiency and expressiveness

12 years 3 months ago
Wrapper induction: Efficiency and expressiveness
The Internet presents numerous sources of useful information--telephone directories, product catalogs, stock quotes, event listings, etc. Recently, many systems have been built that automatically gather and manipulate such information on a user's behalf. However, these resources are usually formatted for use by people (e.g., the relevant content is embedded in HTML pages), so extracting their content is difficult. Most systems use customized wrapper procedures to perform this extraction task. Unfortunately, writing wrappers is tedious and error-prone. As an alternative, we advocate wrapper induction, a technique for automatically constructing wrappers. In this article, we describe six wrapper classes, and use a combination of empirical and analytical techniques to evaluate the computational tradeoffs among them. We first consider expressiveness: how well the classes can handle actual Internet resources, and the extent to which wrappers in one class can mimic those in another. We ...
Nicholas Kushmerick
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2000
Where AI
Authors Nicholas Kushmerick
Comments (0)