Abstract- We investigate the functions of given approximation order L that have the smallest support. Those are shown to be linear combinations of the Bspline of degree .L - 1 and ...
We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
We consider a portfolio allocation problem where the objective function is a tail event such as probability of large portfolio losses. The dependence between assets is captured th...
While process variations are becoming more significant with each new IC technology generation, they are often modeled via linear regression models so that the resulting performanc...
Xin Li, Jiayong Le, Padmini Gopalakrishnan, Lawren...
In this paper, we consider the problem of planning and learning in the infinite-horizon discounted-reward Markov decision problems. We propose a novel iterative direct policysearc...