Sciweavers

ICML
2002
IEEE

Hierarchically Optimal Average Reward Reinforcement Learning

14 years 5 months ago
Hierarchically Optimal Average Reward Reinforcement Learning
Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space defined by a task hierarchy, and a weaker local model called recursive optimality. In this paper, we introduce two new average-reward HRL algorithms for finding hierarchically optimal policies. We compare them to our previously reported algorithms for computing recursively optimal policies, using a grid-world taxi problem and a more real-world AGV scheduling problem. The new algorithms are based on a three-part value function decomposition proposed recently by Andre and Russell, which generalizes Dietterich's MAXQ value function decomposition. A key difference between the algorithms proposed in this paper and our previous work is that there is only a single global gain (average reward), instead of a gain for each subtask. Our results show the new average-reward algorithms have better performance than both the previ...
Mohammad Ghavamzadeh, Sridhar Mahadevan
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2002
Where ICML
Authors Mohammad Ghavamzadeh, Sridhar Mahadevan
Comments (0)