Learning Without State-Estimation in Partially Observable Markovian Decision Processes

15 years 5 months ago

Download www.eecs.umich.edu

Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning control architectures for embedded agents. Unfortunately all of the theory and much of the practice (see Barto et al., 1983, for an exception) of RL is limited to Markovian decision processes (MDPs). Many realworld decision tasks, however, are inherently non-Markovian, i.e., the state of the environmentis onlyincompletelyknownto the learning agent. In this paper we consider only partially observable MDPs (POMDPs), a usefulclass of non-Markoviandecision processes. Most previous approaches to such problems have combined computationally expensive state-estimation techniques with learning control. This paper investigates learning in POMDPs without resorting to any form of state estimation. We present results about what TD(0) and Q-learning will do when applied to POMDPs. It is shown that the conventional discounted RL framework is inadequate to deal with POMDPs. Finally we develop a new framewor...

Satinder P. Singh, Tommi Jaakkola, Michael I. Jord

Real-time Traffic