Sciweavers

18 search results - page 2 / 4
» Incremental Least Squares Policy Iteration for POMDPs
Sort
View
GRC
2008
IEEE
13 years 6 months ago
Adaptive and Iterative Least Squares Support Vector Regression based on Quadratic Renyi Entropy
An adaptive and iterative LSSVR algorithm based on quadratic Renyi entropy is presented in this paper. LS-SVM loses the sparseness of support vector which is one of the important ...
Jingqing Jiang, Chuyi Song, Haiyan Zhao, Chunguo W...
ML
2002
ACM
154views Machine Learning» more  ML 2002»
13 years 4 months ago
Technical Update: Least-Squares Temporal Difference Learning
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It h...
Justin A. Boyan
AAAI
2007
13 years 7 months ago
Point-Based Policy Iteration
We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point...
Shihao Ji, Ronald Parr, Hui Li, Xuejun Liao, Lawre...
ICML
2010
IEEE
13 years 6 months ago
Convergence of Least Squares Temporal Difference Methods Under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
Huizhen Yu