Sciweavers

115 search results - page 15 / 23
» Recurrent policy gradients
Sort
View
ICML
2010
IEEE
15 years 25 days ago
Toward Off-Policy Learning Control with Function Approximation
We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
Hamid Reza Maei, Csaba Szepesvári, Shalabh ...
CEC
2011
IEEE
13 years 11 months ago
Stochastic Natural Gradient Descent by estimation of empirical covariances
—Stochastic relaxation aims at finding the minimum of a fitness function by identifying a proper sequence of distributions, in a given model, that minimize the expected value o...
Luigi Malagò, Matteo Matteucci, Giovanni Pi...
ESANN
2007
15 years 1 months ago
Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning
In this paper, we investigate motor primitive learning with the Natural Actor-Critic approach. The Natural Actor-Critic consists out of actor updates which are achieved using natur...
Jan Peters, Stefan Schaal
NCI
2004
185views Neural Networks» more  NCI 2004»
15 years 1 months ago
Hierarchical reinforcement learning with subpolicies specializing for learned subgoals
This paper describes a method for hierarchical reinforcement learning in which high-level policies automatically discover subgoals, and low-level policies learn to specialize for ...
Bram Bakker, Jürgen Schmidhuber
PCI
2005
Springer
15 years 5 months ago
TSIC: Thermal Scheduling Simulator for Chip Multiprocessors
Abstract. Increased power density, hot-spots, and temperature gradients are severe limiting factors for today’s state-of-the-art microprocessors. However, the flexibility offer...
Kyriakos Stavrou, Pedro Trancoso