Search Sciweavers | Sciweavers

86 search results - page 6 / 18

» Evolution of reward functions for reinforcement learning

100

click to vote

ACL
2009

123views Computational Linguistics» more ACL 2009»

Reinforcement Learning for Mapping Instructions to Actions

14 years 11 months ago

Download www.aclweb.org

In this paper, we present a reinforcement learning approach for mapping natural language instructions to sequences of executable actions. We assume access to a reward function tha...

S. R. K. Branavan, Harr Chen, Luke S. Zettlemoyer,...

claim paper

Read More »

130

click to vote

NN
2010
Springer

187views Neural Networks» more NN 2010»

Efficient exploration through active learning for value function approximation in reinforcement learning

14 years 8 months ago

Download sugiyama-www.cs.titech.ac.jp

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares ...

Takayuki Akiyama, Hirotaka Hachiya, Masashi Sugiya...

claim paper

Read More »

151

click to vote

JMLR
2012

200views Programming Languages» more JMLR 2012»

Contextual Bandit Learning with Predictable Rewards

13 years 3 months ago

Download www.cs.princeton.edu

Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...

Alekh Agarwal, Miroslav Dudík, Satyen Kale,...

claim paper

Read More »

click to vote

ICML
2000
IEEE

126views Machine Learning» more ICML 2000»

Reinforcement Learning in POMDP's via Direct Gradient Ascent

16 years 2 months ago

Download reference.kfupm.edu.sa

This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled ??? ?s. We introduce ??? ?, a...

Jonathan Baxter, Peter L. Bartlett

claim paper

Read More »

113

click to vote

NECO
2010

97views more NECO 2010»

Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning

14 years 11 months ago

Download www.kyb.tuebingen.mpg.de

Most conventional Policy Gradient Reinforcement Learning (PGRL) algorithms neglect (or do not explicitly make use of) a term in the average reward gradient with respect to the pol...

Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto...

claim paper

Read More »

« Prev « First page 6 / 18 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers