Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?

12 years 8 months ago

Download mirlab.org

Most HMM-based TTS systems use a hard voiced/unvoiced classiﬁcation to produce a discontinuous F0 signal which is used for the generation of the source-excitation. When a mixed source excitation is used, this decision can be based on two different sources of information: the state-speciﬁc MSD-prior of the F0 models, and/or the frame-speciﬁc features generated by the aperiodicity model. This paper examines the meaning of these variables in the synthesis process, their interaction, and how they affect the perceived quality of the generated speech The results of several perceptual experiments show that when using mixed excitation, subjects consistently prefer samples with very few or no false unvoiced errors, whereas a reduction in the rate of false voiced errors does not produce any perceptual improvement. This suggests that rather than using any form of hard voiced/unvoiced classiﬁcation, e.g., the MSD-prior, it is better for synthesis to use a continuous F0 signal and rely on ...

Javier Latorre, Mark J. F. Gales, Sabine Buchholz,

Real-time Traffic

Aperiodicity Model | ICASSP 2011 | Mixed Source Excitation | Most HMM-based TTS | Signal Processing |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Javier Latorre, Mark J. F. Gales, Sabine Buchholz, Kate Knill, Masatsune Tamura, Yamato Ohtani, Masami Akamine

Comments (0)

Sciweavers

Continuous F0 in the source-excitation generation for HMM-based TTS: Do we need voiced/unvoiced classification?

Aperiodicity Model | ICASSP 2011 | Mixed Source Excitation | Most HMM-based TTS | Signal Processing |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers