Adaptive Time Warp protocols in the literature are usually based on a pre-defined analytic model of the system, expressed as a closed form function that maps system state to cont...
In this paper we consider the problem of policy evaluation in reinforcement learning, i.e., learning the value function of a fixed policy, using the least-squares temporal-differe...
Alessandro Lazaric, Mohammad Ghavamzadeh, Ré...
Prioritized sweeping is a model-based reinforcement learning method that attempts to focus an agent’s limited computational resources to achieve a good estimate of the value of ...
This paper presents a novel method for on-line coordination in multiagent reinforcement learning systems. In this method a reinforcement-learning agent learns to select its action ...
Operations research and management science are often confronted with sequential decision making problems with large state spaces. Standard methods that are used for solving such c...