The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Temporal difference methods and Bellman residual m...
Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past, which is an essential problem for physically grounded AI as experiments are us...
In reinforcement learning, it is a common practice to map the state(-action) space to a different one using basis functions. This transformation aims to represent the input data i...
We consider the problem of characterisation of sequences of heterogeneous symbolic data that arise from a common underlying temporal pattern. The data, which are subject to impreci...
Here we advocate an approach to learning hardware based on induction of finite state machines from temporal logic constraints. The method involves training on examples, constraint...
Marek A. Perkowski, Alan Mishchenko, Anatoli N. Ch...