Extraction of Audio Features Specific to Speech Production for Multimodal Speaker Detection

15 years 7 months ago

Download infoscience.epfl.ch

A method that exploits an information theoretic framework to extract optimized audio features using video information is presented. A simple measure of mutual information (MI) between the resulting audio and video features allows the detection of the active speaker among different candidates. This method involves the optimization of an MI-based objective function. No approximation is needed to solve this optimization problem, neither for the estimation of the probability density functions (pdfs) of the features, nor for the cost function itself. The pdfs are estimated from the samples using a nonparametric approach. The challenging optimization problem is solved using a global method: the differential evolution algorithm. Two information theoretic optimization criteria are compared and their ability to extract audio features specific to speech production is discussed. Using these specific audio features, candidate video features are then classified as member of the "speaker" ...

Patricia Besson, Vlad Popovici, Jean-Marc Vesin, J

Real-time Traffic

Information Theoretic Framework | MI-based Objective Function | Optimization Problem | TMM 2008 |

claim paper

Post Info
More Details (n/a)

Added	15 Dec 2010
Updated	15 Dec 2010
Type	Journal
Year	2008
Where	TMM
Authors	Patricia Besson, Vlad Popovici, Jean-Marc Vesin, Jean-Philippe Thiran, Murat Kunt

Comments (0)

Sciweavers

Extraction of Audio Features Specific to Speech Production for Multimodal Speaker Detection

Information Theoretic Framework | MI-based Objective Function | Optimization Problem | TMM 2008 |

Explore & Download

Productivity Tools

Sciweavers