We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results ...
When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no s...
This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled ??? ?s. We introduce ??? ?, a...
Abstract--In the Relational Reinforcement learning framework, we propose an algorithm that learns an action model allowing to predict the resulting state of each action in any give...
Imitation is actively being studied as an effective means of learning in multi-agent environments. It allows an agent to learn how to act well (perhaps optimally) by passively obs...