This paper presents a multi-modal approach to locate a speaker in a scene and determine to whom he or she is speaking. We present a simple probabilistic framework that combines mu...
Michael Siracusa, Louis-Philippe Morency, Kevin Wi...
The following paper presents a novel audio-visual approach for unsupervised speaker locationing. Using recordings from a single, low-resolution room overview camera and a single f...
Abstract. In this chapter we report on an investigation into the principles underlying the choice of a particular referential expression to refer to an object located in a domain t...
A multi-image focus of attention mechanism has been developed that can quickly distinguish raised objects like buildings from structured background clutter typical to many aerial ...
A key step in the process of lip-reading is determining the shape of the speaker’s lips. This has previously been achieved through an energy method known as “snakes”, howeve...