While exploring to nd better solutions, an agent performing online reinforcement learning (RL) can perform worse than is acceptable. In some cases, exploration might have unsafe, ...
Satinder P. Singh, Andrew G. Barto, Roderic A. Gru...
We present a new method for estimating the expected return of a POMDP from experience. The estimator does not assume any knowledge of the POMDP, can estimate the returns for finit...
From a conceptual point of view, belief revision and learning are quite similar. Both methods change the belief state of an intelligent agent by processing incoming information. Ho...
Thomas Leopold, Gabriele Kern-Isberner, Gabriele P...
Shaping can be an effective method for improving the learning rate in reinforcement systems. Previously, shaping has been heuristically motivated and implemented. We provide a for...
Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent’s ability to learn useful behaviors by making intelligent use of the kn...