Sciweavers

102 search results - page 3 / 21
» MDPs with Non-Deterministic Policies
Sort
View
UAI
2000
13 years 7 months ago
PEGASUS: A policy search method for large MDPs and POMDPs
We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a mo...
Andrew Y. Ng, Michael I. Jordan
AAAI
2010
13 years 7 months ago
Using Bisimulation for Policy Transfer in MDPs
Knowledge transfer has been suggested as a useful approach for solving large Markov Decision Processes. The main idea is to compute a decision-making policy in one environment and...
Pablo Samuel Castro, Doina Precup
JMLR
2010
189views more  JMLR 2010»
13 years 12 days ago
Adaptive Step-size Policy Gradients with Average Reward Metric
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...
AAAI
2006
13 years 7 months ago
Targeting Specific Distributions of Trajectories in MDPs
We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realiz...
David L. Roberts, Mark J. Nelson, Charles Lee Isbe...
ECAI
2004
Springer
13 years 11 months ago
On-Line Search for Solving Markov Decision Processes via Heuristic Sampling
In the past, Markov Decision Processes (MDPs) have become a standard for solving problems of sequential decision under uncertainty. The usual request in this framework is the compu...
Laurent Péret, Frédérick Garc...