Sciweavers

ICASSP
2010
IEEE

Speech/Non-Speech Detection in Meetings from Automatically Extracted low Resolution Visual Features

13 years 3 months ago
Speech/Non-Speech Detection in Meetings from Automatically Extracted low Resolution Visual Features
In this paper we address the problem of estimating who is speaking from automatically extracted low resolution visual cues in group meetings. Traditionally, the task of speech/non-speech detection or speaker diarization tries to find “who speaks and when” from audio features only. In this paper, we investigate more systematically how speaking status can be estimated from low resolution video We exploit the synchrony of a group’s head and hand motion to learn correspondences between speaking status and visual activity. We also carry out experiments to evaluate how context through the observation of group behaviour and task-oriented activities can help to improve estimates of speaking status. We test on 105 minutes of natural meeting data with unconstrained conversations and compare with state of the art audio-only methods.
Hayley Hung, Sileye O. Ba
Added 25 Jan 2011
Updated 25 Jan 2011
Type Journal
Year 2010
Where ICASSP
Authors Hayley Hung, Sileye O. Ba
Comments (0)