Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing

13 years 10 months ago

Download www.mmk.ei.tum.de

Abstract. Opposing the pre-dominant turn-wise statistics of acoustic LowLevel-Descriptors followed by static classification we re-investigate dynamic modeling directly on the frame-level in speech-based emotion recognition. This seems beneficial, as it is well known that important information on temporal sub-turn-layers exists. And, most promisingly, we integrate this frame-level information within a state-of-the-art large-feature-space emotion recognition engine. In order to investigate frame-level processing we employ a typical speaker-recognition set-up tailored for the use of emotion classification. That is a GMM for classification and MFCC plus speed and acceleration coefficients as features. We thereby also consider use of multiple states, respectively an HMM. In order to fuse this information with turn-based modeling, output scores are added to a super-vector combined with static acoustic features. Thereby a variety of Low-Level-Descriptors and functionals to cover prosodic, spe...

Bogdan Vlasenko, Björn Schuller, Andreas Wend

Real-time Traffic

ACII 2007 | Applied Computing | Emotion Recognition | Large-feature-space Emotion Recognition | Speech-based Emotion Recognition |

claim paper

Added	06 Jun 2010
Updated	06 Jun 2010
Type	Conference
Year	2007
Where	ACII
Authors	Bogdan Vlasenko, Björn Schuller, Andreas Wendemuth, Gerhard Rigoll

Sciweavers

Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing

ACII 2007 | Applied Computing | Emotion Recognition | Large-feature-space Emotion Recognition | Speech-based Emotion Recognition |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers