My thesis aims to contribute towards building autonomous agents that are able to understand their surrounding environment through the use of both audio and visual information. To ...
For the purpose of Multimodal Meeting Manager Project (M4), an approach based on face and a hand tracking is proposed. The technique essentially includes skin color detection, seg...
Incremental processing is relevant for language modeling, speech recognition and language generation. In this paper we devise a dynamic version of Tree Adjoining Grammar (DVTAG) th...
We present a framework to apply Volterra series to analyze multilayered perceptrons trained to estimate the posterior probabilities of phonemes in automatic speech recognition. Th...
Joel Pinto, Garimella S. V. S. Sivaram, Hynek Herm...
This work surveys the potential for predicting demographic traits of individual speakers (gender, age, education level, ethnicity, and geographic region) using only word usage fea...