Sciweavers

ATAL
2015
Springer

Counterfactual Exploration for Improving Multiagent Learning

8 years 10 days ago
Counterfactual Exploration for Improving Multiagent Learning
In any single agent system, exploration is a critical component of learning. It ensures that all possible actions receive some degree of attention, allowing an agent to converge to good policies. The same concept has been adopted by multiagent learning systems. However, there is a fundamentally different dynamic in multiagent learning: each agent operates in a non-stationary environment, as a direct result of the evolving policies of other agents in the system. As such, exploratory actions taken by agents bias the policies of other agents, forcing them to perform optimally in the presence of agent exploration. CLEAN rewards address this issue by privatizing exploration (agents take their best action, but internally compute rewards for counterfactual actions). However, CLEAN rewards require each agent to know the mathematical form of the system evaluation function, which is typically unavailable to agents. In this paper, we present an algorithm to approximate CLEAN rewards, eliminatin...
Mitchell K. Colby, Sepideh Kharaghani, Chris Holme
Added 16 Apr 2016
Updated 16 Apr 2016
Type Journal
Year 2015
Where ATAL
Authors Mitchell K. Colby, Sepideh Kharaghani, Chris HolmesParker, Kagan Tumer
Comments (0)