Sciweavers

45 search results - page 7 / 9
» Efficient exploration through active learning for value func...
Sort
View
ICMLA
2010
13 years 3 months ago
Multimodal Parameter-exploring Policy Gradients
Abstract-- Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estima...
Frank Sehnke, Alex Graves, Christian Osendorfer, J...
ICML
2009
IEEE
14 years 6 months ago
Binary action search for learning continuous-action control policies
Reinforcement Learning methods for controlling stochastic processes typically assume a small and discrete action space. While continuous action spaces are quite common in real-wor...
Jason Pazis, Michail G. Lagoudakis
JMLR
2008
188views more  JMLR 2008»
13 years 5 months ago
Maximal Causes for Non-linear Component Extraction
We study a generative model in which hidden causes combine competitively to produce observations. Multiple active causes combine to determine the value of an observed variable thr...
Jörg Lücke, Maneesh Sahani
ICML
1999
IEEE
14 years 6 months ago
Least-Squares Temporal Difference Learning
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-...
Justin A. Boyan
ICML
1998
IEEE
14 years 6 months ago
Intra-Option Learning about Temporally Abstract Actions
tion Learning about Temporally Abstract Actions Richard S. Sutton Department of Computer Science University of Massachusetts Amherst, MA 01003-4610 rich@cs.umass.edu Doina Precup D...
Richard S. Sutton, Doina Precup, Satinder P. Singh