Sciweavers

INTERSPEECH
2010

Topic and style-adapted language modeling for Thai broadcast news ASR

12 years 11 months ago
Topic and style-adapted language modeling for Thai broadcast news ASR
The amount of available Thai broadcast news transcribed text for training a language model is still very limited, comparing to other major languages. Since the construction of a broadcast news corpus is very costly and time-consuming, newspaper text is often used to increase the size of training text data. This paper proposes a language model topic and style adaptation approach for a Thai broadcast news ASR system, using broadcast news and newspaper text. A rule-based speaking style classification method based on the existence of some specific words is applied to classify training text. Various kinds of language models adapted to topics and styles are studied and shown to successfully reduce test set perplexity and recognition error rate. The results also show that written style text from newspaper can be employed to alleviate the sparseness of the broadcast news corpus while spoken style text from the broadcast news corpus is still essential for building a reliable language model.
Markpong Jongtaveesataporn, Sadaoki Furui
Added 19 May 2011
Updated 19 May 2011
Type Journal
Year 2010
Where INTERSPEECH
Authors Markpong Jongtaveesataporn, Sadaoki Furui
Comments (0)