Search Sciweavers | Sciweavers

7 search results - page 1 / 2

» Finite time bounds for sampling based fitted value iteration

click to vote

ICML
2005
IEEE

135views Machine Learning» more ICML 2005»

Finite time bounds for sampling based fitted value iteration

14 years 5 months ago

Download www.machinelearning.org

In this paper we consider sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a gener...

Csaba Szepesvári, Rémi Munos

claim paper

Read More »

click to vote

JMLR
2008

129views more JMLR 2008»

Finite-Time Bounds for Fitted Value Iteration

13 years 4 months ago

Download www.sztaki.hu

In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decisi...

Rémi Munos, Csaba Szepesvári

claim paper

Read More »

click to vote

AAAI
2012

249views Intelligent Agents» more AAAI 2012»

Generalized Sampling and Variance in Counterfactual Regret Minimization

11 years 7 months ago

Download poker.cs.ualberta.ca

In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. Whi...

Richard G. Gibson, Marc Lanctot, Neil Burch, Duane...

claim paper

Read More »

click to vote

NIPS
1998

164views Information Technology» more NIPS 1998»

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

13 years 6 months ago

Download www.cis.upenn.edu

In this paper, we address two issues of long-standing interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning aft...

Michael J. Kearns, Satinder P. Singh

claim paper

Read More »

click to vote

CORR
2010
Springer

174views Education» more CORR 2010»

Gaussian Process Bandits for Tree Search

13 years 4 months ago

Download www.mendeley.com

We motivate and analyse a new Tree Search algorithm, based on recent advances in the use of Gaussian Processes for bandit problems. We assume that the function to maximise on the ...

Louis Dorard, John Shawe-Taylor

claim paper

Read More »

« Prev « First page 1 / 2 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers