Sciweavers

ICDAR
2003
IEEE

Learning the lexicon from raw texts for open-vocabulary Korean word recognition

13 years 9 months ago
Learning the lexicon from raw texts for open-vocabulary Korean word recognition
In this paper, we propose a novel method of building a language model for open-vocabulary Korean word recognition. Due to the complex morphology of Korean, it is inappropriate to use lexicons based on the linguistic entities such as words and morphemes in openvocabulary domains. Instead, we build the lexicon by collecting variable length character sequences from the raw texts using a dynamic Bayesian network model of the language. In simulated word recognition experiments, the proposed language model could find correct words from lattices of character candidates in 94.3% of cases, increasing the word recognition rates by 20.9%.
Sungho Ryu, Jin Hyung Kim
Added 04 Jul 2010
Updated 04 Jul 2010
Type Conference
Year 2003
Where ICDAR
Authors Sungho Ryu, Jin Hyung Kim
Comments (0)