Sciweavers

ICMI
2003
Springer

A multi-modal approach for determining speaker location and focus

13 years 9 months ago
A multi-modal approach for determining speaker location and focus
This paper presents a multi-modal approach to locate a speaker in a scene and determine to whom he or she is speaking. We present a simple probabilistic framework that combines multiple cues derived from both audio and video information. A purely visual cue is obtained using a head tracker to identify possible speakers in a scene and provide both their 3-D positions and orientation. In addition, estimates of the audio signal’s direction of arrival are obtained with the help of a two-element microphone array. A third cue measures the association between the audio and the tracked regions in the video. Integrating these cues provides a more robust solution than using any single cue alone. The usefulness of our approach is shown in our results for video sequences with two or more people in a prototype interactive kiosk environment.
Michael Siracusa, Louis-Philippe Morency, Kevin Wi
Added 07 Jul 2010
Updated 07 Jul 2010
Type Conference
Year 2003
Where ICMI
Authors Michael Siracusa, Louis-Philippe Morency, Kevin Wilson, John W. Fisher III, Trevor Darrell
Comments (0)