Unifying Convergence and No-Regret in Multiagent Learning

13 years 9 months ago

Download orca.st.usm.edu

We present a new multiagent learning algorithm, RVσ(t), that builds on an earlier version, ReDVaLeR . ReDVaLeR could guarantee (a) convergence to best response against stationary opponents and either (b) constant bounded regret against arbitrary opponents, or (c) convergence to Nash equilibrium policies in self-play. But it makes two strong assumptions: (1) that it can distinguish between self-play and otherwise non-stationary agents and (2) that all agents know their portions of the same equilibrium in self-play. We show that the adaptive learnng rate of RVσ(t)that is explicitly dependent on time can overcome both of these assumptions. Consequently, RVσ(t)theoretically achieves (a’) convergence to near-best response against eventually stationary opponents, (b’) no-regret payoff against arbitrary opponents and (c’) convergence to some Nash equilibrium policy in some classes of games, in self-play. Each agent now needs to know its portion of any equilibrium, and does not need t...

Bikramjit Banerjee, Jing Peng

Real-time Traffic

Arbitrary Opponents | Convergence | LAMAS 2005 | Machine Learning | Stationary Opponents |

claim paper

Added	28 Jun 2010
Updated	28 Jun 2010
Type	Conference
Year	2005
Where	LAMAS
Authors	Bikramjit Banerjee, Jing Peng

Sciweavers

Unifying Convergence and No-Regret in Multiagent Learning

Arbitrary Opponents | Convergence | LAMAS 2005 | Machine Learning | Stationary Opponents |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers