The resource constraint project scheduling problem (RCPSP) is an NP-hard benchmark problem in scheduling which takes into account the limitation of resources’ availabilities in ...
Abstract. We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems. As opposed to previous theoretical wo...
Abstract-- Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estima...
Frank Sehnke, Alex Graves, Christian Osendorfer, J...
Optical lithography is a critical step in the semiconductor manufacturing process, and one key problem is the design of the photomask for a particular circuit pattern, given the o...
In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...