Search Sciweavers | Sciweavers

4345 search results - page 344 / 869

» Relational Reinforcement Learning

149

click to vote

NN
2010
Springer

125views Neural Networks» more NN 2010»

Parameter-exploring policy gradients

15 years 3 months ago

Download www.kyb.mpg.de

We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in paramet...

Frank Sehnke, Christian Osendorfer, Thomas Rü...

claim paper

Read More »

211

click to vote

PKDD
2010
Springer

164views Data Mining» more PKDD 2010»

Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations

15 years 3 months ago

Download users.ics.tkk.fi

Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightf...

Joni Pajarinen, Jaakko Peltonen, Ari Hottinen, Mik...

claim paper

Read More »

181

click to vote

GLOBECOM
2009
IEEE

253views Communications» more GLOBECOM 2009»

Cooperative Communications with Relay Selection for QoS Provisioning in Wireless Sensor Networks

15 years 2 months ago

Download mmlab.snu.ac.kr

Abstract--Cooperative communications have been demonstrated to be effective in combating the multiple fading effects in wireless networks, and improving the network performance in ...

Xuedong Liang, Ilangko Balasingham, Victor C. M. L...

claim paper

Read More »

145

click to vote

COGSR
2011

71views more COGSR 2011»

Psychological models of human and optimal performance in bandit problems

15 years 3 days ago

Download www.socsci.uci.edu

In bandit problems, a decision-maker must choose between a set of alternatives, each of which has a ﬁxed but unknown rate of reward, to maximize their total number of rewards ov...

Michael D. Lee, Shunan Zhang, Miles Munro, Mark St...

claim paper

Read More »

124

click to vote

JAIR
2011

187views more JAIR 2011»

A Monte-Carlo AIXI Approximation

15 years 2 days ago

Download www.hutter1.net

This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two ke...

Joel Veness, Kee Siong Ng, Marcus Hutter, William ...

claim paper

Read More »

« Prev « First page 344 / 869 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers