Polyphase speech recognition

10 years 2 months ago
Polyphase speech recognition
We propose a model for speech recognition that consists of multiple semi-synchronized recognizers operating on a polyphase decomposition of standard speech features. Specifically, we consider multiple out-of-phase downsampled speech features as separate streams which are modeled separately at the lowest level, and are then integrated at the higher level (words) during first-pass decoding. Our model lessens the severity of the oversampling problem in many speech recognition systems – i.e., that speech modulation energy is most important below 25Hz but a 100Hz frame rate gives a modulation bandwidth of 50Hz. Our polyphase approach moreover captures wider and more diverse dynamics within the speech signal. Our integrative network is high-level, namely it couples together and decodes word strings from different recognizers simultaneously and asynchronously. We provide preliminary results on the 10-word vocabulary version of the SVitchboard (small-vocabulary switchboard) task and show ...
Hui Lin, Jeff Bilmes
Added 30 May 2010
Updated 30 May 2010
Type Conference
Year 2008
Authors Hui Lin, Jeff Bilmes
Comments (0)