When exposed to environmental noise, speakers adjust their speech production to maintain intelligible communication. This phenomenon, called Lombard effect (LE), is known to consi...
In this paper, we revisit the linear transformation for VTLN on conventional MFCC proposed by Sanand et al. in [1], using the idea of band-limited interpolation. The filter-bank i...
The multimodal nature of speech is often ignored in human-computer interaction, but lip deformations and other body motion, such as those of the head, convey additional information...
Iain Matthews, Timothy F. Cootes, J. Andrew Bangha...
We demonstrate Parakeet – a continuous speech recognition system for mobile touch-screen devices. Parakeet’s interface is designed to make correcting errors easy on a handheld...
This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic a...