Abstract— We describe a general method to transform a non-Markovian sequential decision problem into a supervised learning problem using a K-bestpaths algorithm. We consider an a...
—This paper presents a method for learning decision theoretic models of human behaviors from video data. Our system learns relationships between the movements of a person, the co...
Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightf...
Joni Pajarinen, Jaakko Peltonen, Ari Hottinen, Mik...
We investigate the possibility to apply a known machine learning algorithm of Q-learning in the domain of a Virtual Learning Environment (VLE). It is important in this problem doma...
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...