Sciweavers

COLING
2000

Jurilinguistic Engineering in Cantonese Chinese: An N-gram-based Speech to Text Transcription System

13 years 5 months ago
Jurilinguistic Engineering in Cantonese Chinese: An N-gram-based Speech to Text Transcription System
A Cantonese Chinese transcription system to automatically convert stenograph code to Chinese characters ix reported. The major challenge in developing such a system is the critical homocode problem because of homonymy. The statistical N-gram model is used to compute the best combination of characters. Supplemented with a 0.85 million character corpus of donmin-specific training data and enhancement measures, the bigram and trigrmn implementations achieve 95% and 96% accuracy respectively, as compared with 78% accuracy in the baseline model. The system perforlnance is comparable with other adwmced Chinese Speech-to-Text input applications under development. The system meets an urgent need o1' the .ludiciary ot: post1997 Hong Kong. Keyword: Speech to Text, Statistical Modelling, Cantonese, Chinese, Language Engineering
Benjamin K. Tsou, K. K. Sin, Samuel W. K. Chan, To
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 2000
Where COLING
Authors Benjamin K. Tsou, K. K. Sin, Samuel W. K. Chan, Tom B. Y. Lai, Caesar Suen Lun, K. T. Ko, Gary K. K. Chan, Lawrence Y. L. Cheung
Comments (0)