Audiovisual-to-articulatory speech inversion using Active Appearance Models for the face and Hidden Markov Models for the dynami

13 years 10 months ago

Download cvsp.cs.ntua.gr

We are interested in recovering aspects of vocal tract’s geometry and dynamics from auditory and visual speech cues. We approach the problem in a statistical framework based on Hidden Markov Models and demonstrate effective estimation of the trajectories followed by certain points of interest in the speech production system. Alternative fusion schemes are investigated to account for asynchrony between the modalities and allow independent modeling of the dynamics of the involved streams. Visual cues are extracted from the speaker’s face by means of Active Appearance Modeling. We report experiments on the QSMT database which contains audio, video, and electromagnetic articulography data recorded in parallel. The results show that exploiting both audio and visual modalities in a multistream HMM based scheme clearly improves performance relative to either audio or visual-only estimation.

Athanassios Katsamanis, George Papandreou, Petros

Real-time Traffic

Hidden Markov Models | ICASSP 2008 | Signal Processing | Visual Speech Cues | Vocal Tract Geometry |

claim paper

» Tracking Using Dynamic Programming for AppearanceBased Sign Language Recognition

» Recognizing Peoples Faces from Human to Machine Vision

Post Info
More Details (n/a)

Added	30 May 2010
Updated	30 May 2010
Type	Conference
Year	2008
Where	ICASSP
Authors	Athanassios Katsamanis, George Papandreou, Petros Maragos

Comments (0)

Sciweavers

Audiovisual-to-articulatory speech inversion using Active Appearance Models for the face and Hidden Markov Models for the dynami

Hidden Markov Models | ICASSP 2008 | Signal Processing | Visual Speech Cues | Vocal Tract Geometry |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers