Sciweavers

7 search results - page 1 / 2
» Pseudometrics for State Aggregation in Average Reward Markov...
Sort
View
ALT
2007
Springer
14 years 2 months ago
Pseudometrics for State Aggregation in Average Reward Markov Decision Processes
We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are we...
Ronald Ortner
JMLR
2010
189views more  JMLR 2010»
13 years 2 days ago
Adaptive Step-size Policy Gradients with Average Reward Metric
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...
IJCAI
2001
13 years 6 months ago
Complexity of Probabilistic Planning under Average Rewards
A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large ...
Jussi Rintanen
QEST
2010
IEEE
13 years 3 months ago
Symblicit Calculation of Long-Run Averages for Concurrent Probabilistic Systems
Abstract--Model checkers for concurrent probabilistic systems have become very popular within the last decade. The study of long-run average behavior has however received only scan...
Ralf Wimmer, Bettina Braitling, Bernd Becker, Erns...
FOCS
2007
IEEE
13 years 11 months ago
Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...
Sudipto Guha, Kamesh Munagala