Safe Q-Learning on Complete History Spaces

15 years 7 months ago

Download www.ni.uos.de

In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observations and actions. A novel and sound technique for learning a Q-function on history spaces is developed and discussed. We analyze certain conditions under which a history based approach is able to learn policies comparable to the optimal solution on belief states. The algorithm presented is model-free and can be combined with any method learning history spaces. We also present a procedure able to learn history spaces especially suited for our Q-learning algorithm.

Stephan Timmer, Martin Riedmiller

Real-time Traffic

ECML 2007 | History Spaces | Machine Learning | Observable Markov Decision | Space Containing Sequences |

claim paper

Post Info
More Details (n/a)

Added	07 Jun 2010
Updated	07 Jun 2010
Type	Conference
Year	2007
Where	ECML
Authors	Stephan Timmer, Martin Riedmiller

Comments (0)

Sciweavers

Safe Q-Learning on Complete History Spaces

ECML 2007 | History Spaces | Machine Learning | Observable Markov Decision | Space Containing Sequences |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers