Efficient Behavior Learning Based on State Value Estimation of Self and Others

15 years 4 months ago

Download www.er.ams.eng.osaka-u.ac.jp

The existing reinforcement learning methods have been seriously suffering from the curse of dimension problem especially when they are applied to multiagent dynamic environments. One of the typical examples is a case of RoboCup competitions since other agents and their behavior easily cause state and action space explosion. This paper presents a method of modular learning in a multiagent environment by which the learning agent can acquire cooperative behavior with its teammates and competitive ones against its opponents. The key ideas to resolve the issue are as follows. First, a two-layer hierarchical system with multi learning modules is adopted to reduce the size of the sensor and action spaces. The state space of the top layer consists of the state values from the lower level, and the macro actions are used to reduce the size of the physical action space. Second, the state of the other, to what extent it is close to its own goal, is estimated by observation and used as a state var...

Yasutake Takahashi, Kentarou Noma, Minoru Asada

Real-time Traffic