Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach

15 years 7 months ago

Download www.isi.edu

A critical problem in developing information agents for the Web is accessing data that is formatted for human use. We have developed a set of tools for extracting data from web sites and transforming it into a structured data format, such as XML. The resulting data can then be used to build new applications without having to deal with unstructured data. The advantages of our wrapping technology over previous work are the the ability to learn highly accurate extraction rules, to verify the wrapper to ensure that the correct data continues to be extracted, and to automatically adapt to changes in the sites from which the data is being extracted.

Craig A. Knoblock, Kristina Lerman, Steven Minton,

Real-time Traffic

Critical Problem | DEBU 2000 | Information Agents | Structured Data Format |

claim paper

» WebSets extracting sets of entities from the web using unsupervised information extraction

» Gene prediction in metagenomic fragments A large scale machine learning approach

» Web data extraction based on partial tree alignment

» Automatic extraction of titles from general documents using machine learning

» Coupling feature selection and machine learning methods for navigational query identificat...

» OpinionMiner a novel machine learning system for web opinion mining and extraction

» Learning to Extract Symbolic Knowledge from the World Wide Web

» Detecting Spam Bots in Online Social Networking Sites A Machine Learning Approach

Post Info
More Details (n/a)

Added	18 Dec 2010
Updated	18 Dec 2010
Type	Journal
Year	2000
Where	DEBU
Authors	Craig A. Knoblock, Kristina Lerman, Steven Minton, Ion Muslea

Comments (0)

Sciweavers

Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach

Critical Problem | DEBU 2000 | Information Agents | Structured Data Format |

Explore & Download

Productivity Tools

Sciweavers