A segment-based audio-visual speech recognizer: data collection, development, and initial experiments

14 years 2 months ago

Download groups.csail.mit.edu

This paper presents the development and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. To support this research, we have collected a new video corpus, called Audio-Visual TIMIT (AV-TIMIT), which consists of 4 total hours of read speech collected from 223 diﬀerent speakers. This new corpus was used to evaluate our new AVSR system which incorporates a novel audio-visual integration scheme using segment-constrained Hidden Markov Models (HMMs). Preliminary experiments have demonstrated improvements in phonetic recognition performance when incorporating visual information into the speech recognition process. Categories and Subject Descriptors I.2.M [Artiﬁcial Intelligence]: Miscellaneous General Terms Algorithms, Design, Experimentation. Keywords Audio-visual speech recognition, audio-visual corpora.

Timothy J. Hazen, Kate Saenko, Chia-Hao La, James

Real-time Traffic

Audio-Visual Speech Recognition | ICMI 2004 | Keywords Audio-visual Speech | Speaker-independent Audio-visual Speech |

claim paper

» Automatic Speech Recognition Based on Electromyographic Biosignals

» Lightly supervised and unsupervised acoustic model training

» A Testbed for Learning by Demonstration from Natural Language and RGBDepth Video

Post Info
More Details (n/a)

Added	01 Jul 2010
Updated	01 Jul 2010
Type	Conference
Year	2004
Where	ICMI
Authors	Timothy J. Hazen, Kate Saenko, Chia-Hao La, James R. Glass

Comments (0)

Sciweavers

A segment-based audio-visual speech recognizer: data collection, development, and initial experiments

Audio-Visual Speech Recognition | ICMI 2004 | Keywords Audio-visual Speech | Speaker-independent Audio-visual Speech |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers