Sciweavers

ICASSP
2008
IEEE

Audiovisual-to-articulatory speech inversion using Active Appearance Models for the face and Hidden Markov Models for the dynami

13 years 10 months ago
Audiovisual-to-articulatory speech inversion using Active Appearance Models for the face and Hidden Markov Models for the dynami
We are interested in recovering aspects of vocal tract’s geometry and dynamics from auditory and visual speech cues. We approach the problem in a statistical framework based on Hidden Markov Models and demonstrate effective estimation of the trajectories followed by certain points of interest in the speech production system. Alternative fusion schemes are investigated to account for asynchrony between the modalities and allow independent modeling of the dynamics of the involved streams. Visual cues are extracted from the speaker’s face by means of Active Appearance Modeling. We report experiments on the QSMT database which contains audio, video, and electromagnetic articulography data recorded in parallel. The results show that exploiting both audio and visual modalities in a multistream HMM based scheme clearly improves performance relative to either audio or visual-only estimation.
Athanassios Katsamanis, George Papandreou, Petros
Added 30 May 2010
Updated 30 May 2010
Type Conference
Year 2008
Where ICASSP
Authors Athanassios Katsamanis, George Papandreou, Petros Maragos
Comments (0)