Binary coding of speech spectrograms using a deep auto-encoder

14 years 9 months ago

Download research.microsoft.com

This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of the generative model learns binary codes that can be used for efficient compression of speech and could also be used for scalable speech recognition or rapid speech content retrieval. Each layer of the generative model is fully connected to the layer below and the weights on these connections are pretrained efficiently by using the contrastive divergence approximation to the log likelihood gradient. After layer-bylayer pre-training we "unroll" the generative model to form a deep auto-encoder, whose parameters are then fine-tuned using back-propagation. To reconstruct the full-length speech spectrogram, individual spectrogram segments predicted by their respective binary codes are combined using an overlapand-add method. Experimental results on speech spectrogram coding demonstrate that the binary cod...

Li Deng, Michael L. Seltzer, Dong Yu, Alex Acero,

Real-time Traffic

Binary Codes | Generative Model | INTERSPEECH 2010 | Signal Processing | Speech Spectrogram |

claim paper

Post Info
More Details (n/a)

Added	18 May 2011
Updated	18 May 2011
Type	Journal
Year	2010
Where	INTERSPEECH
Authors	Li Deng, Michael L. Seltzer, Dong Yu, Alex Acero, Abdel-rahman Mohamed, Geoffrey E. Hinton

Comments (0)

Sciweavers

Binary coding of speech spectrograms using a deep auto-encoder

Binary Codes | Generative Model | INTERSPEECH 2010 | Signal Processing | Speech Spectrogram |

Explore & Download

Productivity Tools

Sciweavers