Sciweavers

63
Voted
AAAI
2007

A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs

15 years 23 hour ago
A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs
An Unobservable MDP (UMDP) is a POMDP in which there are no observations. An Only-Costly-Observable MDP (OCOMDP) is a POMDP which extends an UMDP by allowing a particular costly action which completely observes the state. We introduce UR-MAX, a reinforcement learning algorithm with polynomial interaction complexity for unknown OCOMDPs.
Roy Fox, Moshe Tennenholtz
Added 02 Oct 2010
Updated 02 Oct 2010
Type Conference
Year 2007
Where AAAI
Authors Roy Fox, Moshe Tennenholtz
Comments (0)