This paper presents preliminary work on building a system able to synthesize concurrently the speech signal and a 3D animation of the speaker's face. This is done by concaten...
As a child acquires language, he or she: perceives acoustic information in his or her surrounding environment; identifies portions of the ambient acoustic information as languager...
Andrew R. Plummer, Mary E. Beckman, Mikhail Belkin...
Japanese listeners detected Japanese words embedded at the end of nonsense sequences (e.g., kaba 'hippopotamus' in gyachikaba). When the final portion of the preceding c...
Discriminative confidence estimation along with confidence normalisation have been shown to construct robust decision maker modules in spoken term detection (STD) systems. Discrim...
Javier Tejedor, Doroteo Torre Toledano, Miguel Bau...
In this paper, we propose a novel approach to estimate three types of phone mismatch penalty matrices for two-state keyword spotting. When the output of a phone recognizer is give...
Chang Woo Han, Shin Jae Kang, Chul Min Lee, Nam So...