Refining Information Extraction Rules using Data Provenance

14 years 10 months ago

Download sites.computer.org

Developing high-quality information extraction (IE) rules, or extractors, is an iterative and primarily manual process, extremely time consuming, and error prone. In each iteration, the outputs of the extractor are examined, and the erroneous ones are used to drive the refinement of the extractor in the next iteration. Data provenance explains the origins of an output data, and how it has been transformed through a query. As such, one can expect data provenance to be valuable in understanding and debugging complex IE rules. In this paper we discuss how data provenance can be used beyond understanding and debugging, to automatically refine IE rules. In particular, we overview the main ideas behind a recent provenance-based solution for suggesting a ranked list of refinements to an extractor aimed at increasing its precision, and outline several related directions for future research.

Bin Liu 0002, Laura Chiticariu, Vivian Chu, H. V.

Real-time Traffic

Complex Ie Rules | DEBU 2010 | Distributed And Parallel Computing | IE Rules | Provenance |

claim paper

» The SystemT IDE an integrated development environment for information extraction rules

» On the provenance of nonanswers to queries over extracted data

» A twophase rule generation and optimization approach for wrapper generation

» Rough Set Based Information Retrieval from Argumentative Data Points in Weblogs

» RuleBased Information Extraction for Structured Data Acquisition using TextMarker

» Provenance trails in the WingsPegasus system

» A Marketbased Rule Learning System

» ArcAngelC a Refinement Tactic Language for Circus

Post Info
More Details (n/a)

Added	01 Mar 2011
Updated	01 Mar 2011
Type	Journal
Year	2010
Where	DEBU
Authors	Bin Liu 0002, Laura Chiticariu, Vivian Chu, H. V. Jagadish, Frederick Reiss

Comments (0)

Sciweavers

Refining Information Extraction Rules using Data Provenance

Complex Ie Rules | DEBU 2010 | Distributed And Parallel Computing | IE Rules | Provenance |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers