We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Abstract. Learning of recursive functions refutably means that for every recursive function, the learning machine has either to learn this function or to refute it, i.e., to signal...
Sanjay Jain, Efim B. Kinber, Rolf Wiehagen, Thomas...
Adaptive predictive search (APS), is a learning system framework, which given little initial domain knowledge, increases its decision-making abilities in complex problems domains....
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much ...
Satinder P. Singh, Tommi Jaakkola, Michael I. Jord...
We compute a common feature selection or kernel selection configuration for multiple support vector machines (SVMs) trained on different yet inter-related datasets. The method is ...