Japanese Named Entity Extraction with Redundant Morphological Analysis

8 years 10 months ago
Japanese Named Entity Extraction with Redundant Morphological Analysis
Named Entity (NE) extraction is an important subtask of document processing such as information extraction and question answering. A typical method used for NE extraction of Japanese texts is a cascade of morphological analysis, POS tagging and chunking. However, there are some cases where segmentation granularity contradicts the results of morphological analysis and the building units of NEs, so that extraction of some NEs are inherently impossible in this setting. To cope with the unit problem, we propose a character-based chunking method. Firstly, the input sentence is analyzed redundantly by a statistical morphological analyzer to produce multiple (n-best) answers. Then, each character is annotated with its character types and its possible POS tags of the top n-best answers. Finally, a support vector machine-based chunker picks up some portions of the input sentence as NEs. This method introduces richer information to the chunker than previous methods that base on a single morphol...
Masayuki Asahara, Yuji Matsumoto
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Authors Masayuki Asahara, Yuji Matsumoto
Comments (0)