Sciweavers

437 search results - page 3 / 88
» Policy Gradient Critics
Sort
View
JMLR
2010
189views more  JMLR 2010»
13 years 8 days ago
Adaptive Step-size Policy Gradients with Average Reward Metric
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...
CORR
2006
Springer
113views Education» more  CORR 2006»
13 years 5 months ago
A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD
This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(), LSTD()...
Manuel Loth, Philippe Preux
ICAC
2008
IEEE
13 years 12 months ago
Generating Adaptation Policies for Multi-tier Applications in Consolidated Server Environments
Creating good adaptation policies is critical to building complex autonomic systems since it is such policies that define the system configuration used in any given situation. W...
Gueyoung Jung, Kaustubh R. Joshi, Matti A. Hiltune...
IDEAL
2004
Springer
13 years 11 months ago
Policy Gradient Method for Team Markov Games
The main aim of this paper is to extend the single-agent policy gradient method for multiagent domains where all agents share the same utility function. We formulate these team pro...
Ville Könönen
NECO
2010
97views more  NECO 2010»
13 years 3 months ago
Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the pol...
Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto...