Abstract. Opposing the pre-dominant turn-wise statistics of acoustic LowLevel-Descriptors followed by static classification we re-investigate dynamic modeling directly on the frame...
Untethered multimodal interfaces are more attractive than tethered ones because they are more natural and expressive for interaction. Such interfaces usually require robust vision...
We present a data-driven framework for expanding the lexicon to improve Mandarin broadcast news and conversation speech recognition. The lexicon expansion includes the generation ...
Features derived from Multi-Layer Perceptrons (MLPs) are becoming increasingly popular for speech recognition. This paper describes various schemes for applying these features to ...
J. Park, Frank Diehl, M. J. F. Gales, Marcus Tomal...
Audio segmentation is an essential preprocessing step in several audio processing applications with a significant impact e.g. on speech recognition performance. We introduce a no...