Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

20

AI
2002
Springer

favoriteEmaildiscussreport

171views Artificial Intelligence» more AI 2002»

Multiagent learning using a variable learning rate

13 years 4 months ago

Multiagent learning using a variable learning rate

Download www.cs.cmu.edu

Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on the policies of the other agents. This creates a situation of learning a moving target. Previous learning algorithms have one of two shortcomings depending on their approach. They either converge to a policy that may not be optimal against the specific opponents' policies, or they may not converge at all. In this article we examine this learning problem in the framework of stochastic games. We look at a number of previous learning algorithms showing how they fail at one of the above criteria. We then contribute a new reinforcement learning technique using a variable learning rate to overcome these shortcomings. Specifically, we introduce the WoLF principle, "Win or Learn Fast," for varying the learning rate. We examine this technique theoretically, proving convergence in self-play on a restrict...

Michael H. Bowling, Manuela M. Veloso

Real-time Traffic

AI 2002 | Artificial Intelligence | Optimal Policy | Reinforcement Learning | Stochastic Games |

claim paper

Related Content

» Convergence of Gradient Dynamics with a Variable Learning Rate

» Using adaptive consultation of experts to improve convergence rates in multiagent learning

» Simultaneous Adversarial MultiRobot Learning

» Agent Learning using ActionDependent Learning Rates in Computer RolePlaying Games

» The Moving Target Function Problem in MultiAgent Learning

» Exploiting locality of interactions using a policygradient approach in multiagent learning

» Bounding the False Discovery Rate in Local Bayesian Network Learning

» Rates of Convergence for Variable Resolution Schemes in Optimal Control

» Fuzzy logic based variable step size algorithm for blind delayed source separation

Post Info
More Details (n/a)

Added	16 Dec 2010
Updated	16 Dec 2010
Type	Journal
Year	2002
Where	AI
Authors	Michael H. Bowling, Manuela M. Veloso

Comments (0)