Automatic Set Instance Extraction using the Web

10 years 9 months ago
Automatic Set Instance Extraction using the Web
An important and well-studied problem is the production of semantic lexicons from a large corpus. In this paper, we present a system named ASIA (Automatic Set Instance Acquirer), which takes in the name of a semantic class as input (e.g., "car makers") and automatically outputs its instances (e.g., "ford", "nissan", "toyota"). ASIA is based on recent advances in webbased set expansion - the problem of finding all instances of a set given a small number of "seed" instances. This approach effectively exploits web resources and can be easily adapted to different languages. In brief, we use languagedependent hyponym patterns to find a noisy set of initial seeds, and then use a state-of-the-art language-independent set expansion system to expand these seeds. The proposed approach matches or outperforms prior systems on several Englishlanguage benchmarks. It also shows excellent performance on three dozen additional benchmark problems from E...
Richard C. Wang, William W. Cohen
Added 16 Feb 2011
Updated 16 Feb 2011
Type Journal
Year 2009
Where ACL
Authors Richard C. Wang, William W. Cohen
Comments (0)