Sciweavers

771 search results - page 9 / 155
» Markov Decision Processes with Arbitrary Reward Processes
Sort
View
114
Voted
FOCS
2007
IEEE
15 years 6 months ago
Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...
Sudipto Guha, Kamesh Munagala
IJCAI
2007
15 years 1 months ago
Bayesian Inverse Reinforcement Learning
Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an e...
Deepak Ramachandran, Eyal Amir
IJCAI
2001
15 years 1 months ago
Complexity of Probabilistic Planning under Average Rewards
A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large ...
Jussi Rintanen
AAAI
2010
15 years 1 months ago
Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies
The precise specification of reward functions for Markov decision processes (MDPs) is often extremely difficult, motivating research into both reward elicitation and the robust so...
Kevin Regan, Craig Boutilier
JMLR
2010
189views more  JMLR 2010»
14 years 6 months ago
Adaptive Step-size Policy Gradients with Average Reward Metric
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...