Combining monaural source separation with Long Short-Term Memory for increased robustness in vocalist gender recognition

14 years 2 months ago

Download mirlab.org

We present a novel and unique combination of algorithms to detect the gender of the leading vocalist in recorded popular music. Building on our previous successful approach that enhanced the harmonic parts by means of Non-Negative Matrix Factorization (NMF) for increased accuracy, we integrate on the one hand a new source separation algorithm speciﬁcally tailored to extracting the leading voice from monaural recordings. On the other hand, we introduce Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTM-RNNs) as context-sensitive classiﬁers for this scenario, which have lately led to great success in Music Information Retrieval tasks. Through a combination of leading voice separation and BLSTM networks, as opposed to a baseline approach using Hidden Naive Bayes on the original recordings, the accuracy of simultaneous detection of vocal presence and vocalist gender on beat level is improved by up to

Felix Weninger, Jean-Louis Durrieu, Florian Eyben,

Real-time Traffic

ICASSP 2011 | Long Short-Term Memory | Non-Negative Matrix Factorization | Recurrent Neural Networks | Signal Processing |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2011
Updated	20 Aug 2011
Type	Journal
Year	2011
Where	ICASSP
Authors	Felix Weninger, Jean-Louis Durrieu, Florian Eyben, Gaël Richard, Björn Schuller

Comments (0)

Sciweavers

Combining monaural source separation with Long Short-Term Memory for increased robustness in vocalist gender recognition

ICASSP 2011 | Long Short-Term Memory | Non-Negative Matrix Factorization | Recurrent Neural Networks | Signal Processing |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers