Search Sciweavers | Sciweavers

38 search results - page 3 / 8

» On the Convergence of Optimistic Policy Iteration

118

click to vote

UAI
2004

108views Artificial Intelligence» more UAI 2004»

Heuristic Search Value Iteration for POMDPs

15 years 3 months ago

Download www.cs.cmu.edu

We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret w...

Trey Smith, Reid G. Simmons

claim paper

Read More »

115

Voted

CORR
2010
Springer

119views Education» more CORR 2010»

Dynamic Policy Programming

15 years 1 months ago

Download www.snn.ru.nl

In this paper, we consider the problem of planning and learning in the infinite-horizon discounted-reward Markov decision problems. We propose a novel iterative direct policysearc...

Mohammad Gheshlaghi Azar, Hilbert J. Kappen

claim paper

Read More »

132

Voted

COLT
2007
Springer

143views Machine Learning» more COLT 2007»

Bounded Parameter Markov Decision Processes with Average Reward Criterion

15 years 8 months ago

Download ttic.uchicago.edu

Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, t...

Ambuj Tewari, Peter L. Bartlett

claim paper

Read More »

127

Voted

ECML
2004
Springer

112views Machine Learning» more ECML 2004»

Convergence and Divergence in Standard and Averaging Reinforcement Learning

15 years 7 months ago

Download igitur-archive.library.uu.nl

Although tabular reinforcement learning (RL) methods have been proved to converge to an optimal policy, the combination of particular conventional reinforcement learning techniques...

Marco Wiering

claim paper

Read More »

130

Voted

IJCAI
2001

163views Artificial Intelligence» more IJCAI 2001»

Exploiting Multiple Secondary Reinforcers in Policy Gradient Reinforcement Learning

15 years 3 months ago

Download www.cs.colorado.edu

Most formulations of Reinforcement Learning depend on a single reinforcement reward value to guide the search for the optimal policy solution. If observation of this reward is rar...

Gregory Z. Grudic, Lyle H. Ungar

claim paper

Read More »

« Prev « First page 3 / 8 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers