Polyphase speech recognition

15 years 7 months ago

Download ssli.ee.washington.edu

We propose a model for speech recognition that consists of multiple semi-synchronized recognizers operating on a polyphase decomposition of standard speech features. Speciﬁcally, we consider multiple out-of-phase downsampled speech features as separate streams which are modeled separately at the lowest level, and are then integrated at the higher level (words) during ﬁrst-pass decoding. Our model lessens the severity of the oversampling problem in many speech recognition systems – i.e., that speech modulation energy is most important below 25Hz but a 100Hz frame rate gives a modulation bandwidth of 50Hz. Our polyphase approach moreover captures wider and more diverse dynamics within the speech signal. Our integrative network is high-level, namely it couples together and decodes word strings from different recognizers simultaneously and asynchronously. We provide preliminary results on the 10-word vocabulary version of the SVitchboard (small-vocabulary switchboard) task and show ...

Hui Lin, Jeff Bilmes

Real-time Traffic