Sciweavers

ICASSP
2011
IEEE

Improving acoustic event detection using generalizable visual features and multi-modality modeling

12 years 8 months ago
Improving acoustic event detection using generalizable visual features and multi-modality modeling
Acoustic event detection (AED) aims to identify both timestamps and types of multiple events and has been found to be very challenging. The cues for these events often times exist in both audio and vision, but not necessarily in a synchronized fashion. We study improving the detection and classification of the events using cues from both modalities. We propose optical flow based spatial pyramid histograms as a generalizable visual representation that does not require training on labeled video data. Hidden Markov models (HMMs) are used for audio-only modeling, and multi-stream HMMs or coupled HMMs (CHMM) are used for audio-visual joint modeling. To allow the flexibility of audio-visual state asynchrony, we explore effective CHMM training via HMM state-space mapping, parameter tying and different initialization schemes. The proposed methods successfully improve acoustic event classification and detection on a multimedia meeting room dataset containing eleven types of general non-spe...
Po-Sen Huang, Xiaodan Zhuang, Mark Hasegawa-Johnso
Added 21 Aug 2011
Updated 21 Aug 2011
Type Journal
Year 2011
Where ICASSP
Authors Po-Sen Huang, Xiaodan Zhuang, Mark Hasegawa-Johnson
Comments (0)