We consider the problem of incorporating end-user advice into reinforcement learning (RL). In our setting, the learner alternates between practicing, where learning is based on ac...
Kshitij Judah, Saikat Roy, Alan Fern, Thomas G. Di...
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
This paper presents the CQ algorithm which decomposes and solves a Markov Decision Process (MDP) by automatically generating a hierarchy of smaller MDPs using state variables. The ...
We present the first real-world benchmark for sequentiallyoptimal team formation, working within the framework of a class of online football prediction games known as Fantasy Foo...
Tim Matthews, Sarvapali D. Ramchurn, Georgios Chal...
Fusion of multimedia streams for enhanced performance is a critical problem for retrieval. However, fusion performance tends to easily overfit the hillclimb set used to learn fus...