Search Sciweavers | Sciweavers

86

NIPS
2001

121views Information Technology» more NIPS 2001»

Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning

15 years 1 months ago

We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action ...

Gregory Z. Grudic, Lyle H. Ungar

claim paper

Read More »

88

click to vote

AAAI
2010

171views Intelligent Agents» more AAAI 2010»

Multi-Agent Learning with Policy Prediction

15 years 1 months ago

Download www.cs.umass.edu

Due to the non-stationary environment, learning in multi-agent systems is a challenging problem. This paper first introduces a new gradient-based learning algorithm, augmenting th...

Chongjie Zhang, Victor R. Lesser

claim paper

Read More »

77

click to vote

ICML
2009
IEEE

131views Machine Learning» more ICML 2009»

Monte-Carlo simulation balancing

16 years 15 days ago

Download www.cs.ualberta.ca

In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation polic...

David Silver, Gerald Tesauro

claim paper

Read More »

86

click to vote

ECAI
2008
Springer

124views Artificial Intelligence» more ECAI 2008»

Exploiting locality of interactions using a policy-gradient approach in multiagent learning

15 years 1 months ago

Download gaips.inesc-id.pt

In this paper, we propose a policy gradient reinforcement learning algorithm to address transition-independent Dec-POMDPs. This approach aims at implicitly exploiting the locality...

Francisco S. Melo

claim paper

Read More »

89

click to vote

NIPS
2001

144views Information Technology» more NIPS 2001»

Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

15 years 1 months ago

Download jmlr.csail.mit.edu

Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...

Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers