This paper presents a multi-modal approach to locate a speaker in a scene and determine to whom he or she is speaking. We present a simple probabilistic framework that combines mu...
Michael Siracusa, Louis-Philippe Morency, Kevin Wi...
Partitioning a video sequence into shots is the first step toward video-content analysis and content-based video browsing and retrieval. A video shot is defined as a series of inte...
Text-based search using video speech transcripts is a popular approach for granular video retrieval at the shot or story level. However, misalignment of speech and visual tracks, ...
Abstract. In this paper, we describe an unsupervised learning framework to segment a scene into semantic regions and to build semantic scene models from longterm observations of mo...
Visualization of multidimensional data by means of Multidimensional Scaling (MDS) is a popular technique of exploratory data analysis widely usable, e.g. in analysis of bio-medica...