We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning m...
Shalabh Bhatnagar, Richard S. Sutton, Mohammad Gha...
We consider the problem of minimizing the total weighted completion time on identical parallel machines when jobs have stochastic processing times and may arrive over time. We give...
In this paper we consider on-line disjoint path routing in energy-constrained ad hoc networks. The objective is to maximize the network capacity, i.e. maximize the number of messa...
In hypercube packing, we receive a sequence of hypercubes that need to be packed into unit hypercubes which are called bins. Items arrive online and each item must be placed withi...
We study a class of scheduling problems with batch setups for the online-list and online-time paradigms. Jobs are to be scheduled in batches for processing. All jobs in a batch st...