Sciweavers

ICMI
2004
Springer

Articulatory features for robust visual speech recognition

13 years 10 months ago
Articulatory features for robust visual speech recognition
Visual information has been shown to improve the performance of speech recognition systems in noisy acoustic environments. However, most audio-visual speech recognizers rely on a clean visual signal. In this paper, we explore a novel approach to visual speech modeling, based on articulatory features, which has potential benefits under visually challenging conditions. The idea is to use a set of parallel SVM classifiers to extract different articulatory attributes from the input images, and then combine their decisions to obtain higher-level units, such as visemes or words. We evaluate our approach in a preliminary experiment on a small audio-visual database, using several image noise conditions, and compare it to the standard viseme-based modeling approach. Categories and Subject Descriptors I.4 [Image Processing and Computer Vision] General Terms Algorithms, Design, Experimentation. Keywords Multimodal interfaces, audio-visual speech recognition, speechreading, visual feature extract...
Kate Saenko, Trevor Darrell, James R. Glass
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where ICMI
Authors Kate Saenko, Trevor Darrell, James R. Glass
Comments (0)