We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of bounded-size, stochastic fini...
We study a sequential variance reduction technique for Monte Carlo estimation of functionals in Markov Chains. The method is based on designing sequential control variates using s...
It is well known that stochastic control systems can be viewed as Markov decision processes (MDPs) with continuous state spaces. In this paper, we propose to apply the policy iter...
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schem...