Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

106

ICML
2007
IEEE

favoriteEmaildiscussreport

141views Machine Learning» more ICML 2007»

Reinforcement learning by reward-weighted regression for operational space control

16 years 10 days ago

Reinforcement learning by reward-weighted regression for operational space control

Download www.machinelearning.org

Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degreeof-freedom robots.

Jan Peters, Stefan Schaal

Real-time Traffic

ICML 2007 | Machine Learning | Reinforcement Learning Algorithms | Reinforcement Learning Framework | Reinforcement Learning Problems |

claim paper

Related Content

» Reinforcement Learning for Operational Space Control

» Learning to Control in Operational Space

» Automated Design of Adaptive Controllers for Modular Robots using Reinforcement Learning

» Optimization on a Budget A Reinforcement Learning Approach

» Reinforcement Learning Hierarchical NeuroFuzzy Politree Model for Control of Autonomous Ag...

» Learning nonparametric policies by imitation

» DecisionTheoretic Control of Planetary Rovers

» Learning Evaluation Functions for Large Acyclic Domains

» Hyperellipsoidal conditions in XCS rotation linear approximation and solution structure

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2007
Where	ICML
Authors	Jan Peters, Stefan Schaal

Comments (0)