An adaptive and iterative LSSVR algorithm based on quadratic Renyi entropy is presented in this paper. LS-SVM loses the sparseness of support vector which is one of the important ...
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It h...
We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point...
Shihao Ji, Ronald Parr, Hui Li, Xuejun Liao, Lawre...
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...