Finite-Time Bounds for Fitted Value Iteration

12 years 5 months ago
Finite-Time Bounds for Fitted Value Iteration
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of finite-time bounds on the performance of two versions of sampling-based FVI. The convergence rate results obtained allow us to show that both versions of FVI are well behaving in the sense that by using a sufficiently large number of samples for a large class of MDPs, arbitrary good performance can be achieved with high probability. An important feature of our proof technique is that it permits the study of weighted Lp-norm performance bounds. As a result, our technique applies to a large class of function-approximation methods (e.g., neural networks, adaptive regression trees, kernel machines, locally weighted learning), and our bounds scale well with the effective horizon of...
Rémi Munos, Csaba Szepesvári
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2008
Where JMLR
Authors Rémi Munos, Csaba Szepesvári
Comments (0)