We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modaliti...
Viktor Rozgic, Kyu Jeong Han, Panayiotis G. Georgi...
This work investigates the validity and accuracy of using spatial cues with Time-Delay Estimation (TDE) as a method of segmenting multichannel recorded speech by speaker location....
Eva Cheng, Jason Lukasiak, Ian S. Burnett, David S...
Looking for a better understanding of spontaneous speech-related phenomena and to improve automatic speech recognition (ASR), we present here a study on the relationship between t...
Martine Adda-Decker, Claude Barras, Gilles Adda, P...
Relationships that link static documents discussed during meetings to the corresponding speech transcripts can be of various kinds. The most important ones, thematic links, quotat...
In the domain of candidly-captured student presentation videos, we examine and evaluate approaches for multimodal analysis and indexing of audio and video. We apply visual segment...