Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks

13 years 11 months ago

Download www6.in.tum.de

In this paper we propose a new technique for robust keyword spotting that uses bidirectional Long Short-Term Memory (BLSTM) recurrent neural nets to incorporate contextual information in speech decoding. Our approach overcomes the drawbacks of generative HMM modeling by applying a discriminative learning procedure -linearly maps speech features into an abstract vector space. By incorporating the outputs of a BLSTM network into the speech features, it is able to make use of past and future context for phoneme predictions. The robustness of the approach is evaluated on a keyword spotting task using the HUMAINE Sensitive Artiﬁcial Listener (SAL) database, which contains accented, spontaneous, and emotionally colored speech. The test is particularly stringent because the system is not trained on the SAL database, but only on the TIMIT corpus of read speech. We show that our method prevails over a discriminative keyword spotter without BLSTM-enhanced feature functions, which in turn has ...

Martin Wöllmer, Florian Eyben, Joseph Keshet,

Real-time Traffic

Discriminative Learning Procedure | ICASSP 2009 | Keyword Spotting | Signal Processing | Speech Features |

claim paper

Added	21 May 2010
Updated	21 May 2010
Type	Conference
Year	2009
Where	ICASSP
Authors	Martin Wöllmer, Florian Eyben, Joseph Keshet, Alex Graves, Björn Schuller, Gerhard Rigoll

Sciweavers

Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks

Discriminative Learning Procedure | ICASSP 2009 | Keyword Spotting | Signal Processing | Speech Features |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers