Reinforcement learning models generally assume that a stimulus is presented that allows a learner to unambiguously identify the state of nature, and the reward received is drawn f...
Tobias Larsen, David S. Leslie, Edmund J. Collins,...
We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any ...
Transfer learning concerns applying knowledge learned in one task (the source) to improve learning another related task (the target). In this paper, we use structure mapping, a ps...
When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no s...
This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled ??? ?s. We introduce ??? ?, a...