The fusion of information from heterogenous sensors is crucial to the effectiveness of a multimodal system. Noise affect the sensors of different modalities independently. A good ...
Shankar T. Shivappa, Bhaskar D. Rao, Mohan M. Triv...
We present an approach to detecting and recognizing spoken isolated phrases based solely on visual input. We adopt an architecture that first employs discriminative detection of ...
Kate Saenko, Karen Livescu, Michael Siracusa, Kevi...
Phonetic decision trees are a key concept in acoustic modeling for large vocabulary continuous speech recognition. Although discriminative training has become a major line of rese...
— A solution for the slow convergence of most learning rules for Recurrent Neural Networks (RNN) has been proposed under the terms Liquid State Machines (LSM) and Echo State Netw...
David Verstraeten, Benjamin Schrauwen, Dirk Stroob...
Training accurate acoustic models typically requires a large amount of transcribed data, which can be expensive to obtain. In this paper, we describe a novel semi-supervised learn...
Balakrishnan Varadarajan, Dong Yu, Li Deng, Alex A...