Sciweavers

EMNLP
2009

Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment

13 years 2 months ago
Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment
Traditionally, machine learning approaches for information extraction require human annotated data that can be costly and time-consuming to produce. However, in many cases, there already exists a database (DB) with schema related to the desired output, and records related to the expected input text. We present a conditional random field (CRF) that aligns tokens of a given DB record and its realization in text. The CRF model is trained using only the available DB and unlabeled text with generalized expectation criteria. An annotation of the text induced from inferred alignments is used to train an information extractor. We evaluate our method on a citation extraction task in which alignments between DBLP database records and citation texts are used to train an extractor. Experimental results demonstrate an error reduction of 35% over a previous state-of-the-art method that uses heuristic alignments.
Kedar Bellare, Andrew McCallum
Added 17 Feb 2011
Updated 17 Feb 2011
Type Journal
Year 2009
Where EMNLP
Authors Kedar Bellare, Andrew McCallum
Comments (0)