Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
We describe a framework and equations used to model and predict the behavior of multi-agent systems (MASs) with learning agents. A difference equation is used for calculating the ...
We explore dynamic shaping to integrate our prior beliefs of the final policy into a conventional reinforcement learning system. Shaping provides a positive or negative artificial...
In Multi-Agent learning, agents must learn to select actions that maximize their utility given the action choices of the other agents. Cooperative Coevolution offers a way to evol...
In this paper, we study the relationship between the two techniques known as ant colony optimization (aco) and stochastic gradient descent. More precisely, we show that some empir...