In previous work we have investigated a notion of approximate bisimilarity for labelled Markov processes. We argued that such a notion is more realistic and more feasible to compu...
Consider a hidden Markov chain obtained as the observation process of an ordinary Markov chain corrupted by noise. Zuk, et. al. [13, 14] showed how, in principle, one can explicit...
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
Hidden Markov models (HMMs) have received considerable attention in various communities (e.g, speech recognition, neurology and bioinformatic) since many applications that use HMM...