Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

8

ATAL
2008
Springer

favoriteEmaildiscussreport

123views Intelligent Agents» more ATAL 2008»

Sigma point policy iteration

13 years 6 months ago

Sigma point policy iteration

Download web.mit.edu

In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear value function approximation. These algorithms rely on policy-dependent expectations of the transition and reward functions, which require all experience to be remembered and iterated over for each new policy evaluated. We propose to summarize experience with a compact policy-independent Gaussian model. We show how this policyindependent model can be transformed into a policy-dependent form and used to perform policy evaluation. Because closed-form transformations are rarely available, we introduce an efficient sigma point approximation. We show that the resulting Sigma-Point Policy Iteration algorithm (SPPI) is mathematically equivalent to LSPI for tabular representations and empirically demonstrate comparable performance for approximate representations. However, the experience does not need to be saved or re...

Michael H. Bowling, Alborz Geramifard, David Winga

Real-time Traffic

ATAL 2008 | Intelligent Agents | Policy | Policy Evaluation | Reinforcement Learning |

claim paper

Related Content

» PointBased Policy Iteration

» Approximate Policy Iteration with a Policy Language Bias

» Incremental Least Squares Policy Iteration for POMDPs

» Adaptive Sum Power Iterative Waterfilling for MIMO Cognitive Radio Channels

» NearOptimal Data Dissemination Policies for MultiChannel Single Radio Wireless Sensor Netw...

» On Finding Compromise Solutions in Multiobjective Markov Decision Processes

» Introducing Communication in DisPOMDPs with Locality of Interaction

» Optimum Power Allocation for SingleUser MIMO and MultiUser MIMOMAC with Partial CSI

» Bipedal walking on rough terrain using manifold control

Post Info
More Details (n/a)

Added	12 Oct 2010
Updated	12 Oct 2010
Type	Conference
Year	2008
Where	ATAL
Authors	Michael H. Bowling, Alborz Geramifard, David Wingate

Comments (0)