In this paper we consider sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a gener...
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decisi...
In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. Whi...
Richard G. Gibson, Marc Lanctot, Neil Burch, Duane...
In this paper, we address two issues of long-standing interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning aft...
We motivate and analyse a new Tree Search algorithm, based on recent advances in the use of Gaussian Processes for bandit problems. We assume that the function to maximise on the ...