Sciweavers

Share
EMNLP
2009

Character-level Analysis of Semi-Structured Documents for Set Expansion

9 years 1 months ago
Character-level Analysis of Semi-Structured Documents for Set Expansion
Set expansion refers to expanding a partial set of "seed" objects into a more complete set. One system that does set expansion is SEAL (Set Expander for Any Language), which expands entities automatically by utilizing resources from the Web in a language-independent fashion. In this paper, we illustrated in detail the construction of character-level wrappers for set expansion implemented in SEAL. We also evaluated several kinds of wrappers for set expansion and showed that character-based wrappers perform better than HTML-based wrappers. In addition, we demonstrated a technique that extends SEAL to learn binary relational concepts (e.g., "x is the mayor of the city y") from only two seeds. We also show that the extended SEAL has good performance on our evaluation datasets, which includes English and Chinese, thus demonstrating language-independence.
Richard C. Wang, William W. Cohen
Added 17 Feb 2011
Updated 17 Feb 2011
Type Journal
Year 2009
Where EMNLP
Authors Richard C. Wang, William W. Cohen
Comments (0)
books