Next location prediction anticipates a person's movement based on the history of previous sojourns. It is useful for proactive actions taken to assist the person in an ubiquit...
Jan Petzold, Faruk Bagci, Wolfgang Trumler, Theo U...
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
We give the first rigorous upper bounds on the error of temporal difference (td) algorithms for policy evaluation as a function of the amount of experience. These upper bounds pr...
Recurrent neural networks are able to store information about previous as well as current inputs. This "memory" allows them to solve temporal problems such as language r...
We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. ...