Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network

8 years 5 months ago

Download www.nlab.ci.i.u-tokyo.ac.jp

Abstract. In this paper, we present a novel method for multimodal gesture recognition based on neural networks. Our multi-stream recurrent neural network (MRNN) is a completely data-driven model that can be trained from end to end without domain-speciﬁc hand engineering. The MRNN extends recurrent neural networks with Long Short-Term Memory cells (LSTM-RNNs) that facilitate the handling of variable-length gestures. We propose a recurrent approach for fusing multiple temporal modalities using multiple streams of LSTM-RNNs. In addition, we propose alternative fusion architectures and empirically evaluate the performance and robustness of these fusion strategies. Experimental results demonstrate that the proposed MRNN outperforms other state-of-theart methods in the Sheﬃeld Kinect Gesture (SKIG) dataset, and has signiﬁcantly high robustness to noisy inputs.

Noriki Nishida, Hideki Nakayama

Real-time Traffic