Sciweavers

ISM
2008
IEEE

Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments

13 years 11 months ago
Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments
We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.
Viktor Rozgic, Kyu Jeong Han, Panayiotis G. Georgi
Added 31 May 2010
Updated 31 May 2010
Type Conference
Year 2008
Where ISM
Authors Viktor Rozgic, Kyu Jeong Han, Panayiotis G. Georgiou, Shrikanth S. Narayanan
Comments (0)