Sciweavers

INTERSPEECH
2010

Multimodal speaker diarization using oriented optical flow histograms

12 years 11 months ago
Multimodal speaker diarization using oriented optical flow histograms
Speaker diarization is the task of partitioning an input stream into speaker homogeneous regions, or in other words, to determine "who spoke when." While approaches to this problem have traditionally relied entirely on the audio stream, the availability of accompanying video streams in recent diarization corpora has prompted the study of methods based on multimodal audio-visual features. In this work, we propose the use of robust video features based on oriented optical flow histograms. Using the state-of-the art ICSI diarization system, we show that, when combined with standard audio features, these features improve the diarization error rate by 14% percent over an audio-only baseline.
Mary Tai Knox, Gerald Friedland
Added 18 May 2011
Updated 18 May 2011
Type Journal
Year 2010
Where INTERSPEECH
Authors Mary Tai Knox, Gerald Friedland
Comments (0)