Sciweavers

ACII
2007
Springer

Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing

13 years 10 months ago
Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing
Abstract. Opposing the pre-dominant turn-wise statistics of acoustic LowLevel-Descriptors followed by static classification we re-investigate dynamic modeling directly on the frame-level in speech-based emotion recognition. This seems beneficial, as it is well known that important information on temporal sub-turn-layers exists. And, most promisingly, we integrate this frame-level information within a state-of-the-art large-feature-space emotion recognition engine. In order to investigate frame-level processing we employ a typical speaker-recognition set-up tailored for the use of emotion classification. That is a GMM for classification and MFCC plus speed and acceleration coefficients as features. We thereby also consider use of multiple states, respectively an HMM. In order to fuse this information with turn-based modeling, output scores are added to a super-vector combined with static acoustic features. Thereby a variety of Low-Level-Descriptors and functionals to cover prosodic, spe...
Bogdan Vlasenko, Björn Schuller, Andreas Wend
Added 06 Jun 2010
Updated 06 Jun 2010
Type Conference
Year 2007
Where ACII
Authors Bogdan Vlasenko, Björn Schuller, Andreas Wendemuth, Gerhard Rigoll
Comments (0)