Robust Bayesian reinforcement learning through tight lower bounds

14 years 1 months ago

Download arxiv.org

In the Bayesian approach to sequential decision making, exact calculation of the (subjective) utility is intractable. This extends to most special cases of interest, such as reinforcement learning problems. While utility bounds are known to exist for this problem, so far none of them were particularly tight. In this paper, we show how to efficiently calculate a lower bound, which corresponds to the utility of a near-optimal memoryless policy for the decision problem, which is generally different from both the Bayes-optimal policy and the policy which is optimal for the expected MDP under the current belief. We then show how these can be applied to obtain robust exploration policies in a Bayesian reinforcement learning setting

Christos Dimitrakakis

Real-time Traffic

Bayesian Reinforcement Learning | Bounds | Exploration | Markov Decision Processes |

posted by olethros

Post Info
More Details (n/a)

Added	24 Jan 2012
Updated	24 Jan 2012
Type	Conference
Year	2011
Where	EWRL
Authors	Christos Dimitrakakis

Comments (0)

	Complexity of Stochastic Branch and Bound Methods for Belief Tree Search in Bayesian Reinforcement Learning 509 views
	Reid et al.'s Distance Bounding Protocol and Mafia Fraud Attacks over Noisy Channels 545 views
	Rollout Sampling Approximate Policy Iteration 334 views
	Bayesian variable order Markov models. 404 views
	Statistical Decision Making for Authentication and Intrusion Detection 634 views

Sciweavers

Robust Bayesian reinforcement learning through tight lower bounds

Bayesian Reinforcement Learning | Bounds | Exploration | Markov Decision Processes |

Explore & Download

Productivity Tools

Sciweavers