Sciweavers

ICMI
2004
Springer

A segment-based audio-visual speech recognizer: data collection, development, and initial experiments

13 years 10 months ago
A segment-based audio-visual speech recognizer: data collection, development, and initial experiments
This paper presents the development and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. To support this research, we have collected a new video corpus, called Audio-Visual TIMIT (AV-TIMIT), which consists of 4 total hours of read speech collected from 223 different speakers. This new corpus was used to evaluate our new AVSR system which incorporates a novel audio-visual integration scheme using segment-constrained Hidden Markov Models (HMMs). Preliminary experiments have demonstrated improvements in phonetic recognition performance when incorporating visual information into the speech recognition process. Categories and Subject Descriptors I.2.M [Artificial Intelligence]: Miscellaneous General Terms Algorithms, Design, Experimentation. Keywords Audio-visual speech recognition, audio-visual corpora.
Timothy J. Hazen, Kate Saenko, Chia-Hao La, James
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where ICMI
Authors Timothy J. Hazen, Kate Saenko, Chia-Hao La, James R. Glass
Comments (0)