Phoneme set clustering of accurate modeling is important in the task of multilingual speech recognition, especially when each of the available language training corpora is mismatc...
Many successful models for predicting attention in a scene involve three main steps: convolution with a set of filters, a center-surround mechanism and spatial pooling to constru...
Naila Murray, Maria Vanrell, Xavier Otazu, C. Alej...
This paper presents a graphical model for learning and recognizing human actions. Specifically, we propose to encode actions in a weighted directed graph, referred to as action gra...
For TV and radio shows containing narrowband speech, Speech-to-text (STT) accuracy on the narrowband audio can be improved by using an acoustic model trained on acoustically match...
Support vector machines (SVMs) have proven to be a powerful technique for pattern classification. SVMs map inputs into a high dimensional space and then separate classes with a hy...
William M. Campbell, Joseph P. Campbell, Douglas A...