Sciweavers

IJCNLP
2005
Springer

Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus

13 years 10 months ago
Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus
Abstract. Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domaindependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automatically, given only a medium or large size of Japanese corpus of some domain.
Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohas
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where IJCNLP
Authors Toshiaki Nakazawa, Daisuke Kawahara, Sadao Kurohashi
Comments (0)