Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past severa...
In this paper, we compare several approaches for the extraction of modulation frequency features from speech signal using a phoneme recognition system. The general framework in th...
In this paper, we present an algorithm for the tracking of target speakers in telephone conversations. Speaker tracking consists in retrieving, in an audio recording, segments whi...
We investigate various ways of generating prosodic syllable contour features that have recently been applied to enhance systems for speaker recognition. We compare different appro...
We describe a system for automatically extracting dynamics of tongue gestures from ultrasound images of the tongue using translational deep belief networks (tDBNs). In tDBNs, a jo...