Abstract— This paper proposes a simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes. The algorithm is tested i...
Ruben Martinez-Cantin, Nando de Freitas, Arnaud Do...
Temporal difference reinforcement learning algorithms are perfectly suited to autonomous agents because they learn directly from an agent’s experience based on sequential actio...