Conventional image categorization techniques primarily rely on low-level visual cues. In this paper, we describe a multimodal fusion scheme which improves the image classification...
Acoustic event detection (AED) aims to identify both timestamps and types of multiple events and has been found to be very challenging. The cues for these events often times exist...
Po-Sen Huang, Xiaodan Zhuang, Mark Hasegawa-Johnso...
My thesis aims to contribute towards building autonomous agents that are able to understand their surrounding environment through the use of both audio and visual information. To ...
This paper presents a novel approach to automatically identify characters in films using audio visual cues and text analysis. The approach consists of three stages: (i) frontal f...
We present our studies on the application of Coupled Hidden Markov Models(CHMMs) to sports highlights extraction from broadcast video using both audio and video information. First,...