Reinforcement Learning Via Practice and Critique Advice

15 years 6 months ago

Download web.engr.oregonstate.edu

We consider the problem of incorporating end-user advice into reinforcement learning (RL). In our setting, the learner alternates between practicing, where learning is based on actual world experience, and end-user critique sessions where advice is gathered. During each critique session the end-user is allowed to analyze a trajectory of the current policy and then label an arbitrary subset of the available actions as good or bad. Our main contribution is an approach for integrating all of the information gathered during practice and critiques in order to effectively optimize a parametric policy. The approach optimizes a loss function that linearly combines losses measured against the world experience and the critique data. We evaluate our approach using a prototype system for teaching tactical battle behavior in a real-time strategy game engine. Results are given for a significant evaluation involving ten end-users showing the promise of this approach and also highlighting challenges ...

Kshitij Judah, Saikat Roy, Alan Fern, Thomas G. Di

Real-time Traffic

AAAI 2010 | Critique Session | End-user | End-user Critique Sessions | Intelligent Agents |

claim paper

Related Content

» Knowledge transfer via advice taking

» Skill Acquisition Via Transfer Learning and Advice Taking

» Reinforcement Learning via AIXI Approximation

» Improving reinforcement learning function approximators via neuroevolution

» Modelfree reinforcement learning as mixture learning

» On step sizes stochastic shortest paths and survival probabilities in Reinforcement Learni...

» Empirical investigation throughout the CS curriculum

» Gaussian Processes for Sample Efficient Reinforcement Learning with RMAXLike Exploration

» SampleEfficient Evolutionary Function Approximation for Reinforcement Learning

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	AAAI
Authors	Kshitij Judah, Saikat Roy, Alan Fern, Thomas G. Dietterich

Comments (0)

Sciweavers

Reinforcement Learning Via Practice and Critique Advice

AAAI 2010 | Critique Session | End-user | End-user Critique Sessions | Intelligent Agents |

Explore & Download

Productivity Tools

Sciweavers