We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
Conventional wisdom attributes the lack of effective technology use in classrooms to a shortage of professional development or poorly run professional development. At the same time...
How should a reinforcement learning agent act if its sole purpose is to efficiently learn an optimal policy for later use? In other words, how should it explore, to be able to exp...
Abstract— Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers...
There is a close relationship between harmonic functions { which have recently been proposed for path planning { and hitting probabilities for random processes. The hitting probab...