Audio-visual speaker diarisation is the task of estimating “who spoke when” using audio and visual cues. In this paper we propose the combination of an audio diarisation syste...
Multi-stream hidden Markov models (HMMs) have recently been very successful in audio-visual speech recognition, where the audio and visual streams are fused at the final decision...