Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment

15 years 3 months ago

Download www.cs.umass.edu

Traditionally, machine learning approaches for information extraction require human annotated data that can be costly and time-consuming to produce. However, in many cases, there already exists a database (DB) with schema related to the desired output, and records related to the expected input text. We present a conditional random field (CRF) that aligns tokens of a given DB record and its realization in text. The CRF model is trained using only the available DB and unlabeled text with generalized expectation criteria. An annotation of the text induced from inferred alignments is used to train an information extractor. We evaluate our method on a citation extraction task in which alignments between DBLP database records and citation texts are used to train an extractor. Experimental results demonstrate an error reduction of 35% over a previous state-of-the-art method that uses heuristic alignments.

Kedar Bellare, Andrew McCallum

Real-time Traffic

Citation Extraction Task | Conditional Random Field | EMNLP 2009 | Human Annotated Data | Natural Language Processing |

claim paper

Post Info
More Details (n/a)

Added	17 Feb 2011
Updated	17 Feb 2011
Type	Journal
Year	2009
Where	EMNLP
Authors	Kedar Bellare, Andrew McCallum

Comments (0)

Sciweavers

Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment

Citation Extraction Task | Conditional Random Field | EMNLP 2009 | Human Annotated Data | Natural Language Processing |

Explore & Download

Productivity Tools

Sciweavers