We present a probabilistic method for audio-visual (AV) speaker tracking, using an uncalibrated wide-angle camera and a microphone array. The algorithm fuses 2-D object shape and ...
Daniel Gatica-Perez, Guillaume Lathoud, Iain McCow...
Cross-modal analysis offers information beyond that extracted from individual modalities. Consider a camcorder having a single microphone in a cocktail-party: it captures several ...
Automatic image annotation techniques that try to identify the objects in images usually need the images to be segmented first, especially when specifically annotating image reg...
Most modern computer vision systems for high-level
tasks, such as image classification, object recognition and
segmentation, are based on learning algorithms that are
able to se...
In object recognition, soft-assignment coding enjoys computational efficiency and conceptual simplicity. However, its classification performance is inferior to the newly develop...