Abstract. Can we discover audio-visually consistent events from videos in a totally unsupervised manner? And, how to mine videos with different genres? In this paper we present our...
An application for content-based annotation and retrieval of videos can be found in the sport domain, where videos are annotated in order to produce short summaries for news and sp...
Lamberto Ballan, Marco Bertini, Alberto Del Bimbo,...
Automatic Language Identification (LID) in music has received significantly less attention than LID in speech. Here, we study the problem of LID in music videos uploaded on YouT...
Vijay Chandrasekhar, Mehmet Emre Sargin, David A. ...
We study the usefulness of intermediate semantic concepts in bridging the semantic gap in automatic video retrieval. The results of a series of large-scale retrieval experiments, w...
Grounded language models represent the relationship between words and the non-linguistic context in which they are said. This paper describes how they are learned from large corpo...