Sciweavers

TASLP
2016

Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection

8 years 15 days ago
Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection
—Voice activity detection (VAD) is an important topic in audio signal processing. Contextual information is important for improving the performance of VAD at low signal-to-noise ratios. Here we explore contextual information by machine learning methods at three levels. At the top level, we employ an ensemble learning framework, named multi-resolution stacking (MRS), which is a stack of ensemble classifiers. Each classifier in a building block inputs the concatenation of the predictions of its lower building blocks and the expansion of the raw acoustic feature by a given window (called a resolution). At the middle level, we describe a base classifier in MRS, named boosted deep neural network (bDNN). bDNN first generates multiple base predictions from different contexts of a single frame by only one DNN and then aggregates the base predictions for a better prediction of the frame, and it is different from computationally-expensive boosting methods that train ensembles of classifie...
Xiao-Lei Zhang, DeLiang Wang
Added 10 Apr 2016
Updated 10 Apr 2016
Type Journal
Year 2016
Where TASLP
Authors Xiao-Lei Zhang, DeLiang Wang
Comments (0)