Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
A central problem in learning in complex environmentsis balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of explora...
Recent work has exploited boundedness of data in the unsupervised learning of new types of generative model. For nonnegative data it was recently shown that the maximum-entropy ge...
Discrete-event simulation is widely used to analyse and improve the performance of manufacturing systems. The related optimization problem often includes integer design variables ...
S. J. Abspoel, L. F. P. Etman, J. Vervoort, J. E. ...
In designing autonomous agents that deal competently with issues involving time and space, there is a tradeoff to be made between guaranteed response-time reactions on the one han...