Abstract— This paper proposes a simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes. The algorithm is tested i...
Ruben Martinez-Cantin, Nando de Freitas, Arnaud Do...
Modeling the behavior of imperfect agents from a small number of observations is a difficult, but important task. In the singleagent decision-theoretic setting, inverse optimal co...