Sciweavers

11 search results - page 3 / 3
» Learning in Reactive Environments with Arbitrary Dependence
Sort
View
JMLR
2008
129views more  JMLR 2008»
13 years 5 months ago
Finite-Time Bounds for Fitted Value Iteration
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decisi...
Rémi Munos, Csaba Szepesvári