Sciweavers

7 search results - page 1 / 2
» Finite time bounds for sampling based fitted value iteration
Sort
View
ICML
2005
IEEE
14 years 5 months ago
Finite time bounds for sampling based fitted value iteration
In this paper we consider sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a gener...
Csaba Szepesvári, Rémi Munos
JMLR
2008
129views more  JMLR 2008»
13 years 4 months ago
Finite-Time Bounds for Fitted Value Iteration
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decisi...
Rémi Munos, Csaba Szepesvári
AAAI
2012
11 years 7 months ago
Generalized Sampling and Variance in Counterfactual Regret Minimization
In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. Whi...
Richard G. Gibson, Marc Lanctot, Neil Burch, Duane...
NIPS
1998
13 years 6 months ago
Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
In this paper, we address two issues of long-standing interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning aft...
Michael J. Kearns, Satinder P. Singh
CORR
2010
Springer
174views Education» more  CORR 2010»
13 years 4 months ago
Gaussian Process Bandits for Tree Search
We motivate and analyse a new Tree Search algorithm, based on recent advances in the use of Gaussian Processes for bandit problems. We assume that the function to maximise on the ...
Louis Dorard, John Shawe-Taylor