Sciweavers

ICWE
2007
Springer

Fixing Weakly Annotated Web Data Using Relational Models

13 years 11 months ago
Fixing Weakly Annotated Web Data Using Relational Models
In this paper, we present a fast and scalable Bayesian model for improving weakly annotated data – which is typically generated by a (semi) automated information extraction (IE) system from Web documents. Weakly annotated data suffers from two major problems: they (i) might contain incorrect ontological role assignments, and (ii) might have many missing attributes. Our experimental evaluations with the TAP and RoadRunner data sets, and a collection of 20,000 home pages from university, shopping and sports Web sites, indicate that the model described here can improve the accuracy of role assignments from 40% to 85% for template driven sites, from 68% to 87% for non-template driven sites. The Bayesian model is also shown to be useful for improving the performance of IE systems by informing them with additional domain information.
Fatih Gelgi, Srinivas Vadrevu, Hasan Davulcu
Added 08 Jun 2010
Updated 08 Jun 2010
Type Conference
Year 2007
Where ICWE
Authors Fatih Gelgi, Srinivas Vadrevu, Hasan Davulcu
Comments (0)