Multi-modal speaker diarization of real-world meetings using compressed-domain video features

15 years 4 months ago

Download www.idiap.ch

Speaker diarization is originally deﬁned as the task of determining “who spoke when” given an audio track and no other prior knowledge of any kind. The following article shows a multi-modal approach where we improve a stateof-the-art speaker diarization system by combining standard acoustic features (MFCCs) with compressed domain video features. The approach is evaluated on over 4.5 hours of the publicly available AMI meetings dataset which contains challenges such as people standing up and walking out of the room. We show a consistent improvement of about 34 % relative in speaker error rate (21 % DER) compared to a state-ofthe-art audio-only baseline.

Gerald Friedland, Hayley Hung, Chuohao Yeo

Real-time Traffic

ICASSP 2009 | Signal Processing | Speaker Diarization | Speaker Error Rate | Stateof-the-art Speaker Diarization |

claim paper

Post Info
More Details (n/a)

Added	21 May 2010
Updated	21 May 2010
Type	Conference
Year	2009
Where	ICASSP
Authors	Gerald Friedland, Hayley Hung, Chuohao Yeo

Comments (0)

Sciweavers

Multi-modal speaker diarization of real-world meetings using compressed-domain video features

ICASSP 2009 | Signal Processing | Speaker Diarization | Speaker Error Rate | Stateof-the-art Speaker Diarization |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers