The output of a speech recognition system is not always ideal for subsequent downstream processing, in part because speakers themselves often make mistakes. A system would accompl...
Real-time transcription provides deaf and hard of hearing people visual access to spoken content, such as classroom instruction, and other live events. Currently, the only reliabl...
Walter S. Lasecki, Christopher D. Miller, Donato B...
This paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when the number of source signals is large, and the potential sou...
This paper addresses selecting between candidate pronunciations for out-of-vocabulary words in speech processing tasks. We introduce a simple, unsupervised method that outperforms...
Christopher M. White, Abhinav Sethy, Bhuvana Ramab...
An image-based approach provides an efficient way for visual speech synthesis. In an image-based visual speech synthesis system, a few lip images, namely visemes, are used for ge...