We introduce new, efficient algorithms for value iteration with multiple reward functions and continuous state. We also give an algorithm for finding the set of all nondominated a...
Daniel J. Lizotte, Michael H. Bowling, Susan A. Mu...
This review considers the theoretical problems facing agents that must learn and choose on the basis of reward or reinforcement that is uncertain or delayed, in implicit or proced...
er provides new techniques for abstracting the state space of a Markov Decision Process (MDP). These techniques extend one of the recent minimization models, known as -reduction, ...
When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no s...
This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled ??? ?s. We introduce ??? ?, a...