Joint encoding of the waveform and speech recognition features using a transform codec

14 years 8 months ago

Download mirlab.org

We propose a new transform speech codec that jointly encodes a wideband waveform and its corresponding wideband and narrowband speech recognition features. For distributed speech recognition, wideband features are compressed and transmitted as side information. The waveform is then encoded in a manner that exploits the information already captured by the speech features. Narrowband speech acoustic features can be synthesized at the server by applying a transformation to the decoded wideband features. An evaluation conducted on an in-car speech recognition task show that at 16 kbps our new system typically shows essentially no impact in word error rate compared to uncompressed audio, whereas the standard transform codec produces up to a 20% increase in word error rate. In addition, good quality speech is obtained for playback and transcription, with PESQ scores ranging from 3.2 to 3.4.

Xing Fan, Michael L. Seltzer, Jasha Droppo, Henriq

Real-time Traffic

Acoustic Features | ICASSP 2011 | Signal Processing | Speech Codec | Word Error Rate |

claim paper

» A comparison of approaches for modeling prosodic features in speaker recognition

» HistogramBased Quantization for Robust andor Distributed Speech Recognition

» Robust automatic speech recognition with decoder oriented ideal binary mask estimation

Post Info
More Details (n/a)

Added	29 Aug 2011
Updated	29 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Xing Fan, Michael L. Seltzer, Jasha Droppo, Henrique S. Malvar, Alex Acero

Comments (0)

Sciweavers

Joint encoding of the waveform and speech recognition features using a transform codec

Acoustic Features | ICASSP 2011 | Signal Processing | Speech Codec | Word Error Rate |

Explore & Download

Productivity Tools

Sciweavers