Reinforcement learning algorithms that employ neural networks as function approximators have proven to be powerful tools for solving optimal control problems. However, their traini...
Abstract. We investigate the problem of using function approximation in reinforcement learning where the agent’s policy is represented as a classifier mapping states to actions....
The Maximum Differential Backlog (MDB) control policy of Tassiulas and Ephremides has been shown to adaptively maximize the stable throughput of multihop wireless networks with ran...
In this paper, we propose a new lower approximation scheme for POMDP with discounted and average cost criterion. The approximating functions are determined by their values at a fi...
Abstract— This paper proposes a simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes. The algorithm is tested i...
Ruben Martinez-Cantin, Nando de Freitas, Arnaud Do...