Sciweavers

6 search results - page 1 / 2
» Least absolute policy iteration for robust value function ap...
Sort
View
ICRA
2009
IEEE
143views Robotics» more  ICRA 2009»
13 years 10 months ago
Least absolute policy iteration for robust value function approximation
Abstract— Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers...
Masashi Sugiyama, Hirotaka Hachiya, Hisashi Kashim...
AAAI
2007
13 years 6 months ago
Point-Based Policy Iteration
We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point...
Shihao Ji, Ronald Parr, Hui Li, Xuejun Liao, Lawre...
PKDD
2009
Springer
169views Data Mining» more  PKDD 2009»
13 years 10 months ago
Hybrid Least-Squares Algorithms for Approximate Policy Evaluation
The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Temporal difference methods and Bellman residual m...
Jeffrey Johns, Marek Petrik, Sridhar Mahadevan
ICML
1999
IEEE
14 years 4 months ago
Least-Squares Temporal Difference Learning
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-...
Justin A. Boyan
CORR
2010
Springer
170views Education» more  CORR 2010»
13 years 3 months ago
Global Optimization for Value Function Approximation
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bili...
Marek Petrik, Shlomo Zilberstein