An Audio-Visual Particle Filter for Speaker Tracking on the CLEAR'06 Evaluation Dataset

13 years 8 months ago

Download www.sfb588.uni-karlsruhe.de

We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view face detection and upper body detection. On the audio side, the time delays of arrival between pairs of microphones are estimated with a generalized cross correlation function. In the CLEAR'06 evaluation, the system yielded a tracking accuracy (MOTA) of 71% for video-only, 55% for audio-only and 90% for combined audio-visual tracking.

Kai Nickel, Tobias Gehrig, Hazim Kemal Ekenel, Joh

Real-time Traffic

Biometrics | CLEAR 2006 | Joint Particle Filter | Multi-view Face Detection | Upper Body Detection |

claim paper

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2006
Where	CLEAR
Authors	Kai Nickel, Tobias Gehrig, Hazim Kemal Ekenel, John W. McDonough, Rainer Stiefelhagen

Sciweavers

An Audio-Visual Particle Filter for Speaker Tracking on the CLEAR'06 Evaluation Dataset

Biometrics | CLEAR 2006 | Joint Particle Filter | Multi-view Face Detection | Upper Body Detection |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers