Search Sciweavers | Sciweavers

16 search results - page 2 / 4

» Bounded Parameter Markov Decision Processes with Average Rew...

click to vote

JMLR
2010

189views more JMLR 2010»

Adaptive Step-size Policy Gradients with Average Reward Metric

13 years 4 days ago

Download jmlr.csail.mit.edu

In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...

Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...

claim paper

Read More »

click to vote

CDC
2009
IEEE

169views Control Systems» more CDC 2009»

Parametric regret in uncertain Markov decision processes

13 years 10 months ago

Download www.cim.mcgill.ca

— We consider decision making in a Markovian setup where the reward parameters are not known in advance. Our performance criterion is the gap between the performance of the best ...

Huan Xu, Shie Mannor

claim paper

Read More »

click to vote

FOCS
2007
IEEE

157views Theoretical Computer Science» more FOCS 2007»

Approximation Algorithms for Partial-Information Based Stochastic Control with Markovian Rewards

13 years 11 months ago

Download www.cis.upenn.edu

We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...

Sudipto Guha, Kamesh Munagala

claim paper

Read More »

click to vote

EWRL
2008

186views Machine Learning» more EWRL 2008»

Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case

13 years 7 months ago

Download webee.technion.ac.il

We consider reinforcement learning in the parameterized setup, where the model is known to belong to a parameterized family of Markov Decision Processes (MDPs). We further impose ...

Kirill Dyagilev, Shie Mannor, Nahum Shimkin

claim paper

Read More »

click to vote

NIPS
2007

146views Information Technology» more NIPS 2007»

Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

13 years 6 months ago

Download books.nips.cc

We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). O...

Ambuj Tewari, Peter L. Bartlett

claim paper

Read More »

« Prev « First page 2 / 4 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers