Nearly optimal exploration-exploitation decision thresholds

13 years 4 months ago

Download www.idiap.ch

While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds for the multi-armed bandit problem, one for the infinite horizon discounted reward case and one for the finite horizon undiscounted reward case are derived, which make the link between the reward horizon, uncertainty and the need for exploration explicit. From this result follow two practical approximate algorithms, which are illustrated experimentally.

Christos Dimitrakakis

Real-time Traffic

CORR 2006 | Education | Infinite Horizon | Multi-armed Bandit Problem | Reward Horizon |

posted by olethros

» Lower Bounds for Noisy Wireless Networks using Sampling Algorithms

Post Info
More Details (n/a)

Added	11 Dec 2010
Updated	15 Dec 2011
Type	Journal
Year	2006
Where	CORR
Authors	Christos Dimitrakakis

Comments (0)

	Complexity of Stochastic Branch and Bound Methods for Belief Tree Search in Bayesian Reinforcement Learning 509 views
	Reid et al.'s Distance Bounding Protocol and Mafia Fraud Attacks over Noisy Channels 545 views
	Rollout Sampling Approximate Policy Iteration 334 views
	Bayesian variable order Markov models. 404 views
	Statistical Decision Making for Authentication and Intrusion Detection 634 views

Sciweavers

Nearly optimal exploration-exploitation decision thresholds

CORR 2006 | Education | Infinite Horizon | Multi-armed Bandit Problem | Reward Horizon |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers