A multi-stream ASR framework for BLSTM modeling of conversational speech

13 years 1 months ago

Download mirlab.org

We propose a novel multi-stream framework for continuous conversational speech recognition which employs bidirectional Long Short-Term Memory (BLSTM) networks for phoneme prediction. The BLSTM architecture allows recurrent neural nets to model longrange context, which led to improved ASR performance when combined with conventional triphone modeling in a Tandem system. In this paper, we extend the principle of joint BLSTM and triphone modeling to a multi-stream system which uses MFCC features and BLSTM predictions as observations originating from two independent data streams. Using the COSINE database, we show that this technique prevails over a recently proposed single-stream Tandem system as well as over a conventional HMM recognizer.

Martin Wöllmer, Florian Eyben, Björn Sch

Real-time Traffic

Continuous Conversational Speech | ICASSP 2011 | Recurrent Neural Nets | Signal Processing | Triphone Modeling |

claim paper

» Creating conversational interfaces for children

» Phonetic pronunciations for arabic speechtotext systems

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll

Comments (0)

Sciweavers

A multi-stream ASR framework for BLSTM modeling of conversational speech

Continuous Conversational Speech | ICASSP 2011 | Recurrent Neural Nets | Signal Processing | Triphone Modeling |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers