Search Sciweavers | Sciweavers

69 search results - page 7 / 14

» PAC-Bayesian Policy Evaluation for Reinforcement Learning

Voted

GECCO
2006
Springer

133views Optimization» more GECCO 2006»

On-line evolutionary computation for reinforcement learning in stochastic domains

15 years 3 months ago

Download userweb.cs.utexas.edu

In reinforcement learning, an agent interacting with its environment strives to learn a policy that specifies, for each state it may encounter, what action to take. Evolutionary c...

Shimon Whiteson, Peter Stone

claim paper

Read More »

122

click to vote

IJCAI
2007

179views Artificial Intelligence» more IJCAI 2007»

Heuristic Selection of Actions in Multiagent Reinforcement Learning

15 years 1 months ago

Download www.ijcai.org

This work presents a new algorithm, called Heuristically Accelerated Minimax-Q (HAMMQ), that allows the use of heuristics to speed up the wellknown Multiagent Reinforcement Learni...

Reinaldo A. C. Bianchi, Carlos H. C. Ribeiro, Anna...

claim paper

Read More »

154

Voted

ILP
2007
Springer

283views Automated Reasoning» more ILP 2007»

Building Relational World Models for Reinforcement Learning

15 years 5 months ago

Download ftp.cs.wisc.edu

Abstract. Many reinforcement learning domains are highly relational. While traditional temporal-difference methods can be applied to these domains, they are limited in their capaci...

Trevor Walker, Lisa Torrey, Jude W. Shavlik, Richa...

claim paper

Read More »

113

click to vote

AAAI
2012

205views Intelligent Agents» more AAAI 2012»

Kernel-Based Reinforcement Learning on Representative States

13 years 2 months ago

Download www.bkveton.com

Markov decision processes (MDPs) are an established framework for solving sequential decision-making problems under uncertainty. In this work, we propose a new method for batchmod...

Branislav Kveton, Georgios Theocharous

claim paper

Read More »

click to vote

ATAL
2008
Springer

123views Intelligent Agents» more ATAL 2008»

Sigma point policy iteration

15 years 1 months ago

Download web.mit.edu

In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...

Michael H. Bowling, Alborz Geramifard, David Winga...

claim paper

Read More »

« Prev « First page 7 / 14 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers