Sciweavers

55 search results - page 9 / 11
» Approximate Policy Iteration using Large-Margin Classifiers
Sort
View
UAI
2001
13 years 7 months ago
Expectation Propagation for approximate Bayesian inference
This paper presents a new deterministic approximation technique in Bayesian networks. This method, "Expectation Propagation," unifies two previous techniques: assumed-de...
Thomas P. Minka
CORR
2006
Springer
113views Education» more  CORR 2006»
13 years 6 months ago
A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD
This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(), LSTD()...
Manuel Loth, Philippe Preux
NIPS
2007
13 years 7 months ago
Incremental Natural Actor-Critic Algorithms
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning m...
Shalabh Bhatnagar, Richard S. Sutton, Mohammad Gha...
ICML
1999
IEEE
14 years 7 months ago
Least-Squares Temporal Difference Learning
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-...
Justin A. Boyan
IJCAI
2007
13 years 7 months ago
A Fast Analytical Algorithm for Solving Markov Decision Processes with Real-Valued Resources
Agents often have to construct plans that obey deadlines or, more generally, resource limits for real-valued resources whose consumption can only be characterized by probability d...
Janusz Marecki, Sven Koenig, Milind Tambe