Images and videos can be indexed by multiple features at different levels, such as color, texture, motion, and text annotation. Organizing this information into a system so that u...
Spatial priors play crucial roles in many high-level vision tasks, e.g. scene understanding. Usually, learning spatial priors relies on training a structured output model. In this...
ion when we annotate content. This therefore requires us to investigate and model video semantics. Because of the type and volume of data, general-purpose approaches are likely to ...
Given a video and associated text, we propose an automatic annotation scheme in which we employ a latent topic model to generate topic distributions from weighted text and then mo...
Chris Engels, Koen Deschacht, Jan Hendrik Becker, ...
We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions ...
Polina Kuznetsova, Vicente Ordonez, Alexander C. B...