The manual transcription of human gesture behavior from video for linguistic analysis is a work-intensive process that results in a rather coarse description of the original motio...
Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speec...
Mehmet Emre Sargin, Oya Aran, Alexey Karpov, Ferda...
This paper presents a system for another input modality in a multimodal human-machine interaction scenario. In addition to other common input modalities, e.g. speech, we extract he...
We investigate methods of segmenting, visualizing, and indexing presentation videos by both audio and visual data. The audio track is segmented by speaker, and augmented with key ...
In this paper, we discuss the potential for effective representations of video data to aid analysis of large datasets of video clips and describe a prototype developed to explore ...