Two-Level Bimodal Association for Audio-Visual Speech Recognition

15 years 10 months ago

Download infoscience.epfl.ch

This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data.

Jong-Seok Lee, Touradj Ebrahimi

Real-time Traffic

ACIVS 2009 | Audio-Visual Speech Recognition | Computer Vision | Noise Conditions | Visual Data Streams |

claim paper

» A New Multimodal Database for Developing Speech Recognition Systems for an Assistive Techn...

Post Info
More Details (n/a)

Added	25 May 2010
Updated	25 May 2010
Type	Conference
Year	2009
Where	ACIVS
Authors	Jong-Seok Lee, Touradj Ebrahimi

Comments (0)

Sciweavers

Two-Level Bimodal Association for Audio-Visual Speech Recognition

ACIVS 2009 | Audio-Visual Speech Recognition | Computer Vision | Noise Conditions | Visual Data Streams |

Explore & Download

Productivity Tools

Sciweavers