Sciweavers

ICASSP
2011
IEEE

Audiovisual classification of vocal outbursts in human conversation using Long-Short-Term Memory networks

12 years 7 months ago
Audiovisual classification of vocal outbursts in human conversation using Long-Short-Term Memory networks
We investigate classification of non-linguistic vocalisations with a novel audiovisual approach and Long Short-Term Memory (LSTM) Recurrent Neural Networks as highly successful dynamic sequence classifiers. As database of evaluation serves this year’s Paralinguistic Challenge’s Audiovisual Interest Corpus of human-to-human natural conversation. For video-based analysis we compare shape and appearance based features. These are fused in an early manner with typical audio descriptors. The results show significant improvements of LSTM networks over a static approach based on Support Vector Machines. More important, we can show a significant gain in performance when fusing audio and visual shape features.
Florian Eyben, Stavros Petridis, Björn Schull
Added 21 Aug 2011
Updated 21 Aug 2011
Type Journal
Year 2011
Where ICASSP
Authors Florian Eyben, Stavros Petridis, Björn Schuller, Georgios Tzimiropoulos, Stefanos Zafeiriou, Maja Pantic
Comments (0)