Exploring Co-Occurence Between Speech and Body Movement for Audio-Guided Video Localization

13 years 3 months ago

Download marathon.csee.usf.edu

This paper presents a bottom-up approach that combines audio and video to simultaneously locate individual speakers in the video (2-D source localization) and segment their speech (speaker diarization), in meetings recorded by a single stationary camera and a single microphone. The novelty lies in using motion information from the entire body rather than just the face to perform these tasks, which permits processing nonfrontal views unlike previous work. Since body-movements do not exhibit instantaneous signal-level synchrony with speech, the approach targets long term co-occurrences between audio and video subspaces. First, temporal clustering of the audio produces a large number of intermediate clusters, each containing speech from only a single speaker. Then, spatial clustering is performed in the video frames of each cluster by a novel eigen-analysis method to find the region of dominant motion. This region is associated with the speech assuming that a speaker exhibits more movemen...

H. Vajaria, S. Sarkar, R. Kasturi

Real-time Traffic

2-D Source Localization | Intermediate Cluster | Speaker Diarization | TCSV 2008 |

claim paper

Post Info
More Details (n/a)

Added	29 Dec 2010
Updated	29 Dec 2010
Type	Journal
Year	2008
Where	TCSV
Authors	H. Vajaria, S. Sarkar, R. Kasturi

Comments (0)

Sciweavers

Exploring Co-Occurence Between Speech and Body Movement for Audio-Guided Video Localization

2-D Source Localization | Intermediate Cluster | Speaker Diarization | TCSV 2008 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers