Audio-Visual Speech Recognition (AVSR) uses vision to enhance speech recognition but also introduces the problem of how to join (or fuse) these two signals together. Mainstream re...
The statistical pattern recognition is a promising framework for text-to-text translation. However, a natural extension to speech-input translation is not straightforward. In this...
This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic a...
This paper presents the development and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. To suppo...
Timothy J. Hazen, Kate Saenko, Chia-Hao La, James ...
Emotion expression is an essential part of human interaction. Rich emotional information is conveyed through the human face. In this study, we analyze detailed motion-captured fac...
Angeliki Metallinou, Carlos Busso, Sungbok Lee, Sh...