Visual Speech Recognition with Loosely Synchronized Feature Streams

13 years 10 months ago

Download people.csail.mit.edu

We present an approach to detecting and recognizing spoken isolated phrases based solely on visual input. We adopt an architecture that ﬁrst employs discriminative detection of visual speech and articulatory features, and then performs recognition using a model that accounts for the loose synchronization of the feature streams. Discriminative classiﬁers detect the subclass of lip appearance corresponding to the presence of speech, and further decompose it into features corresponding to the physical components of articulatory production. These components often evolve in a semi-independent fashion, and conventional visemebased approaches to recognition fail to capture the resulting co-articulation effects. We present a novel dynamic Bayesian network with a multi-stream structure and observations consisting of articulatory feature classiﬁer scores, which can model varying degrees of co-articulation in a principled way. We evaluate our visual-only recognition system on a command utt...

Kate Saenko, Karen Livescu, Michael Siracusa, Kevi

Real-time Traffic

Articulatory Feature | Articulatory Feature Classiﬁer | Computer Vision | Employs Discriminative Detection | ICCV 2005 |

claim paper

» Using multiple visual tandem streams in audiovisual speech recognition

» Continuous AudioVisual Speech Recognition

» Rapid Feature Space Speaker Adaptation for MultiStream HMMBased AudioVisual Speech Recogni...

» Translingual Visual Speech Synthesis

» Crossmodal Matching of Speakers Using Lip and Voice Features in Temporally NonOverlapping ...

» Dynamic AudioVisual Mapping using Fused Hidden Markov Model Inversion Method

» Tools for Building Asynchronous Servers to Support Speech and Audio Applications

» TexttoAudiovisual Speech Synthesizer

Post Info
More Details (n/a)

Added	24 Jun 2010
Updated	24 Jun 2010
Type	Conference
Year	2005
Where	ICCV
Authors	Kate Saenko, Karen Livescu, Michael Siracusa, Kevin Wilson, James R. Glass, Trevor Darrell

Comments (0)

Sciweavers

Visual Speech Recognition with Loosely Synchronized Feature Streams

Articulatory Feature | Articulatory Feature Classiﬁer | Computer Vision | Employs Discriminative Detection | ICCV 2005 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers