Sciweavers

115 search results - page 15 / 23
» Recurrent policy gradients
Sort
View
ICML
2010
IEEE
14 years 10 months ago
Toward Off-Policy Learning Control with Function Approximation
We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
Hamid Reza Maei, Csaba Szepesvári, Shalabh ...
104
Voted
CEC
2011
IEEE
13 years 9 months ago
Stochastic Natural Gradient Descent by estimation of empirical covariances
—Stochastic relaxation aims at finding the minimum of a fitness function by identifying a proper sequence of distributions, in a given model, that minimize the expected value o...
Luigi Malagò, Matteo Matteucci, Giovanni Pi...
83
Voted
ESANN
2007
14 years 11 months ago
Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning
In this paper, we investigate motor primitive learning with the Natural Actor-Critic approach. The Natural Actor-Critic consists out of actor updates which are achieved using natur...
Jan Peters, Stefan Schaal
NCI
2004
185views Neural Networks» more  NCI 2004»
14 years 11 months ago
Hierarchical reinforcement learning with subpolicies specializing for learned subgoals
This paper describes a method for hierarchical reinforcement learning in which high-level policies automatically discover subgoals, and low-level policies learn to specialize for ...
Bram Bakker, Jürgen Schmidhuber
62
Voted
PCI
2005
Springer
15 years 3 months ago
TSIC: Thermal Scheduling Simulator for Chip Multiprocessors
Abstract. Increased power density, hot-spots, and temperature gradients are severe limiting factors for today’s state-of-the-art microprocessors. However, the flexibility offer...
Kyriakos Stavrou, Pedro Trancoso