Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

9

ALT
2006
Springer

favoriteEmaildiscussreport

109views Machine Learning» more ALT 2006»

General Discounting Versus Average Reward

14 years 1 months ago

General Discounting Versus Average Reward

Download www.idsia.ch

Consider an agent interacting with an environment in cycles. In every interaction cycle the agent is rewarded for its performance. We compare the average reward U from cycle 1 to m (average value) with the future discounted reward V from cycle k to ∞ (discounted value). We consider essentially arbitrary (non-geometric) discount sequences and arbitrary reward sequences (non-MDP environments). We show that asymptotically U for m → ∞ and V for k → ∞ are equal, provided both limits exist. Further, if the eﬀective horizon grows linearly with k or faster, then the existence of the limit of U implies that the limit of V exists. Conversely, if the eﬀective horizon grows linearly with k or slower, then existence of the limit of V implies that the limit of U exists. Contents

Marcus Hutter

Real-time Traffic

ALT 2006 | Arbitrary Reward Sequences | Average Reward | Future Discounted Reward | Machine Learning |

claim paper

Related Content

» On Average Versus Discounted Reward TemporalDifference Learning

» ContinuousTime Hierarchical Reinforcement Learning

» Complexity of Probabilistic Planning under Average Rewards

» Hierarchically Optimal Average Reward Reinforcement Learning

» On the Convergence of Bound Optimization Algorithms

» Efficient QoS Provisioning for Adaptive Multimedia in Mobile Communication Networks by Rei...

Post Info
More Details (n/a)

Added	14 Mar 2010
Updated	14 Mar 2010
Type	Conference
Year	2006
Where	ALT
Authors	Marcus Hutter

Comments (0)