We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter f...
Kai Nickel, Tobias Gehrig, Hazim Kemal Ekenel, Joh...
In this paper, we propose a unified graphical-model framework to interpret a scene composed of multiple objects in monocular video sequences. Using a single pairwise Markov random...
Tracking humans in an indoor environment is an essential part of surveillance systems. Vision based and microphone array based trackers have been extensively researched in the pas...
Shankar T. Shivappa, Mohan M. Trivedi, Bhaskar D. ...
Although the availability of large video corpora are on the rise, the value of these datasets remain largely untapped due to the difficulty of analyzing their contents. Automatic ...
Abstract. The complexity of visual representations is substantially limited by the compositional nature of our visual world which, therefore, renders learning structured object mod...