Sciweavers

AAAI
2007

A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs

13 years 6 months ago
A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs
An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-MAX, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.
Roy Fox, Moshe Tennenholtz
Added 02 Oct 2010
Updated 02 Oct 2010
Type Conference
Year 2007
Where AAAI
Authors Roy Fox, Moshe Tennenholtz
Comments (0)