Grounded language models represent the relationship between words and the non-linguistic context in which they are said. This paper describes how they are learned from large corpo...
The present work aims to model the correspondence between facial motion and speech. The face and sound are modelled separately, with phonemes being the link between both. We propo...
This paper presents a novel method to model and recognize human faces in video sequences. Each registered person is represented by a low-dimensional appearance manifold in the amb...
This paper argues that tracking, object detection, and model-building are all similar activities. We describe a fully automatic system that builds 2D articulated models known as pi...
Abstract— We propose here to acquire high resolution sequences of a person’s face using a pan-tilt-zoom (PTZ) network camera. This capability should prove helpful in forensic a...