We propose a new two-stage framework for joint analysis of head gesture and speech prosody patterns of a speaker toward automatic realistic synthesis of head gestures from speech p...
In this paper we address the problem of extracting important (and unimportant) discourse patterns from call center conversations. Call centers provide dialog based calling-in supp...
Anup Chalamalla, Sumit Negi, L. Venkata Subramania...
The automatic transcription of broadcast news and meetings involves the segmentation, identification and tracking of speaker turns during each session, which is known as speaker di...
To bridge the semantic gap in content-based image retrieval, detecting meaningful visual entities (e.g. faces, sky, foliage, buildings etc) in image content and classifying images...
We present a multi-camera system for audio-visual analysis of dance figures. The multi-view video of a dancing actor is acquired using 8 synchronized cameras. The motion capture t...