Sciweavers

102 search results - page 3 / 21
» MDPs with Non-Deterministic Policies
Sort
View
UAI
2000
14 years 10 months ago
PEGASUS: A policy search method for large MDPs and POMDPs
We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a mo...
Andrew Y. Ng, Michael I. Jordan
AAAI
2010
14 years 11 months ago
Using Bisimulation for Policy Transfer in MDPs
Knowledge transfer has been suggested as a useful approach for solving large Markov Decision Processes. The main idea is to compute a decision-making policy in one environment and...
Pablo Samuel Castro, Doina Precup
JMLR
2010
189views more  JMLR 2010»
14 years 4 months ago
Adaptive Step-size Policy Gradients with Average Reward Metric
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...
AAAI
2006
14 years 11 months ago
Targeting Specific Distributions of Trajectories in MDPs
We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realiz...
David L. Roberts, Mark J. Nelson, Charles Lee Isbe...
ECAI
2004
Springer
15 years 2 months ago
On-Line Search for Solving Markov Decision Processes via Heuristic Sampling
In the past, Markov Decision Processes (MDPs) have become a standard for solving problems of sequential decision under uncertainty. The usual request in this framework is the compu...
Laurent Péret, Frédérick Garc...