Sciweavers

CSL
2010
Springer

Monaural speech separation based on MAXVQ and CASA for robust speech recognition

13 years 4 months ago
Monaural speech separation based on MAXVQ and CASA for robust speech recognition
Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn the grouping cues on isolated clean data for each speaker. Given an utterance, speaker identification is firstly performed to identify the two speakers presented in the utterance, then the factorial-max vector quantization model (MAXVQ) is used to infer the mask signals and finally the utterance of the target speaker is resynthesized in the CASA framework. Recognition results on the 2006 speech separation challenge corpus prove that this proposed system can improve the robustness of ASR significantly.
Peng Li, Yong Guan, Shijin Wang, Bo Xu, Wenju Liu
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2010
Where CSL
Authors Peng Li, Yong Guan, Shijin Wang, Bo Xu, Wenju Liu
Comments (0)