Sciweavers

ICASSP
2011
IEEE

Tonal context labeling using quantized F0 symbols for improving tone correctness in average-voice-based speech synthesis

12 years 7 months ago
Tonal context labeling using quantized F0 symbols for improving tone correctness in average-voice-based speech synthesis
This paper proposes a technique for improving tone correctness in Thai speech synthesis based on an average voice model trained with nonprofessional speech corpus. The proposed technique utilizes quantized F0 symbols as the tonal context in order to obtain an appropriate F0 model. With this technique, the prosodic context can be extracted from real speech directly and this leads to prevent the inconsistency between speech data and F0 labels generated from transcription, which affects the naturalness and tone correctness in synthetic speech. We examine two types of tonal context labeling using the quantized F0 symbols based on phone and sub-phone boundaries. Experimental results of both objective and subjective tests show that the proposed technique can improve not only the naturalness but also the tone correctness of synthetic speech under condition of using a small amount speech data of nonprofessional target speakers.
Vataya Chunwijitra, Takashi Nose, Takao Kobayashi
Added 21 Aug 2011
Updated 21 Aug 2011
Type Journal
Year 2011
Where ICASSP
Authors Vataya Chunwijitra, Takashi Nose, Takao Kobayashi
Comments (0)