Sciweavers

NIPS
2001
13 years 5 months ago
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
ESANN
2003
13 years 5 months ago
Improving iterative repair strategies for scheduling with the SVM
The resource constraint project scheduling problem (RCPSP) is an NP-hard benchmark problem in scheduling which takes into account the limitation of resources’ availabilities in ...
Kai Gersmann, Barbara Hammer
IJCAI
2001
13 years 5 months ago
Symbolic Dynamic Programming for First-Order MDPs
We present a dynamic programming approach for the solution of first-order Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the ...
Craig Boutilier, Raymond Reiter, Bob Price
ESANN
2004
13 years 5 months ago
High-accuracy value-function approximation with neural networks applied to the acrobot
Several reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this pape...
Rémi Coulom
AAAI
2006
13 years 5 months ago
Sample-Efficient Evolutionary Function Approximation for Reinforcement Learning
Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent's optimal value function. In most real-world proble...
Shimon Whiteson, Peter Stone
NIPS
2007
13 years 5 months ago
Random Sampling of States in Dynamic Programming
We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using lo...
Christopher G. Atkeson, Benjamin Stephens
ECAI
2008
Springer
13 years 5 months ago
Reinforcement Learning with the Use of Costly Features
In many practical reinforcement learning problems, the state space is too large to permit an exact representation of the value function, much less the time required to compute it. ...
Robby Goetschalckx, Scott Sanner, Kurt Driessens
WSC
2007
13 years 6 months ago
Path-wise estimators and cross-path regressions: an application to evaluating portfolio strategies
Recently developed dual techniques allow us to evaluate a given sub-optimal dynamic portfolio policy by using the policy to construct an upper bound on the optimal value function....
Martin B. Haugh, Ashish Jain
AAAI
2008
13 years 6 months ago
A Variance Analysis for POMDP Policy Evaluation
Partially Observable Markov Decision Processes have been studied widely as a model for decision making under uncertainty, and a number of methods have been developed to find the s...
Mahdi Milani Fard, Joelle Pineau, Peng Sun
PRICAI
2000
Springer
13 years 7 months ago
A POMDP Approximation Algorithm That Anticipates the Need to Observe
This paper introduces the even-odd POMDP, an approximation to POMDPs in which the world is assumed to be fully observable every other time step. The even-odd POMDP can be converte...
Valentina Bayer Zubek, Thomas G. Dietterich