Voice conversion can be reduced to a problem to find a transformation function between the corresponding speech sequences of two speakers. Perhaps the most voice conversions meth...
In the present work we address the problem of phone duration modeling for the needs of emotional speech synthesis. Specifically, relying on ten well known machine learning techniqu...
We present a system for quickly and cheaply building transcribed speech corpora containing utterances from many speakers in a variety of acoustic conditions. The system consists o...
Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu...
Basic language-inherent tempo cannot be isolated by the current metrics of speech rhythm. Here we propose the number of syllables per intonation unit as an appropriate measure, al...
Current speech recognition systems are often based on HMMs with state-clustered Gaussian Mixture Models (GMMs) to represent the context dependent output distributions. Though high...