Sciweavers

INTERSPEECH
2010

An HMM trajectory tiling (HTT) approach to high quality TTS

12 years 11 months ago
An HMM trajectory tiling (HTT) approach to high quality TTS
We propose an HMM Trajectory Tiling (HTT) approach to high quality TTS, which is our entry to Blizzard Challenge 2010. In HTT, first refined HMM is trained with the Minimum Generation Error (MGE) criterion; then trajectory generated by the refined HMM is to guide the search for finding the closest waveform segment "tiles" in synthesis. Normalized distances between HMM trajectory and those of the waveform unit candidates are used for selecting final candidates in a unit sausage (lattice). Normalized cross-correlation, a good concatenation measure for its high relevance to spectral similarity, phase continuity and concatenation time instants, is used for finding the best unit sequence in the sausage. The sequence serves as the best segment tiles to closely follow the HMM trajectory guide. Tested in four tasks, {EH1, EH2, MH1 and MH2}, of Blizzard Challenge 2010, the new HTT approach delivers high quality, natural sounding TTS speech without sacrificing high intelligibility. Su...
Yao Qian, Zhi-Jie Yan, Yijian Wu, Frank K. Soong,
Added 18 May 2011
Updated 18 May 2011
Type Journal
Year 2010
Where INTERSPEECH
Authors Yao Qian, Zhi-Jie Yan, Yijian Wu, Frank K. Soong, Xin Zhuang, Shengyi Kong
Comments (0)