A spoken language generation system has been developed that learns to describe objects in computer-generated visual scenes. The system is trained by a `show-and-tell' procedu...
We propose an HMM Trajectory Tiling (HTT) approach to high quality TTS, which is our entry to Blizzard Challenge 2010. In HTT, first refined HMM is trained with the Minimum Genera...
Yao Qian, Zhi-Jie Yan, Yijian Wu, Frank K. Soong, ...
Automatically extracting semantic content from audio streams can be helpful in many multimedia applications. Motivated by the known limitations of traditional supervised approache...
Translation of proper names is generally recognized as a significant problem in many multi-lingual text and speech processing applications. Even when large bilingual lexicons use...
Autonomous synthetic actors must invent variations of known material in order to perform given only a limited script, and to assist the director with development of the performance...