Sciweavers

81 search results - page 3 / 17
» The Optimal Reward Baseline for Gradient-Based Reinforcement...
Sort
View
CSL
2012
Springer
12 years 1 months ago
Reinforcement learning for parameter estimation in statistical spoken dialogue systems
Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estim...
Filip Jurcícek, Blaise Thomson, Steve Young
COLT
2004
Springer
13 years 11 months ago
Reinforcement Learning for Average Reward Zero-Sum Games
Abstract. We consider Reinforcement Learning for average reward zerosum stochastic games. We present and analyze two algorithms. The first is based on relative Q-learning and the ...
Shie Mannor
AI
1998
Springer
13 years 5 months ago
Model-Based Average Reward Reinforcement Learning
Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discoun...
Prasad Tadepalli, DoKyeong Ok
CORR
2006
Springer
140views Education» more  CORR 2006»
13 years 5 months ago
Nearly optimal exploration-exploitation decision thresholds
While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds ...
Christos Dimitrakakis
ICML
2006
IEEE
14 years 6 months ago
An intrinsic reward mechanism for efficient exploration
How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exp...
Özgür Simsek, Andrew G. Barto