Markov decision processes (MDPs) are an established framework for solving sequential decision-making problems under uncertainty. In this work, we propose a new method for batchmod...
tion Learning about Temporally Abstract Actions Richard S. Sutton Department of Computer Science University of Massachusetts Amherst, MA 01003-4610 rich@cs.umass.edu Doina Precup D...
Richard S. Sutton, Doina Precup, Satinder P. Singh
Abstract. This paper presents anovel variational method forimage segmentation that uni es boundary and region-based information sources under the Geodesic Active Region framework. ...
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
—We propose a steepest descent method to compute optimal control parameters for balancing between multiple performance objectives in stateless stochastic scheduling, wherein the ...
Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip, Na...