This paper describes a new hybrid method based on the application of the Population Training Algorithm (PTA) and linear programming (LP) for generation of schedules for drivers in...
In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...
We develop a variant of the Nelder-Mead (NM) simplex search procedure for stochastic simulation optimization that is designed to avoid many of the weaknesses encumbering such dire...
We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy, but may behave di erently due to position-dependent inputs. All...
Crown structures in a graph are defined and shown to be useful in kernelization algorithms for the classic vertex cover problem. Two vertex cover kernelization methods are discus...
Faisal N. Abu-Khzam, Michael R. Fellows, Michael A...