Sciweavers

2040 search results - page 312 / 408
» Approximate Expectation Maximization
Sort
View
NIPS
2001
15 years 4 months ago
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
121
Voted
AAAI
1998
15 years 4 months ago
Bayesian Q-Learning
A central problem in learning in complex environmentsis balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of explora...
Richard Dearden, Nir Friedman, Stuart J. Russell
NIPS
2000
15 years 4 months ago
High-temperature Expansions for Learning Models of Nonnegative Data
Recent work has exploited boundedness of data in the unsupervised learning of new types of generative model. For nonnegative data it was recently shown that the maximum-entropy ge...
Oliver B. Downs
128
Voted
WSC
2000
15 years 4 months ago
Simulation optimization of stochastic systems with integer variables by sequential linearization
Discrete-event simulation is widely used to analyse and improve the performance of manufacturing systems. The related optimization problem often includes integer design variables ...
S. J. Abspoel, L. F. P. Etman, J. Vervoort, J. E. ...
IJCAI
1989
15 years 4 months ago
A Model for Projection and Action
In designing autonomous agents that deal competently with issues involving time and space, there is a tradeoff to be made between guaranteed response-time reactions on the one han...
Keiji Kanazawa, Thomas Dean