L1 regularization is effective for feature selection, but the resulting optimization is challenging due to the non-differentiability of the 1-norm. In this paper we compare state...
In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of...
In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use o...
Amir Massoud Farahmand, Mohammad Ghavamzadeh, Csab...
Currently, there is a lack of general-purpose in-place learning networks that model feature layers in the cortex. By "general-purpose" we mean a general yet adaptive hig...
Juyang Weng, Tianyu Luwang, Hong Lu, Xiangyang Xue
The asymptotic behavior of stochastic gradient algorithms is studied. Relying on some results of differential geometry (Lojasiewicz gradient inequality), the almost sure pointconve...