Sciweavers

4345 search results - page 344 / 869
» Relational Reinforcement Learning
Sort
View
NN
2010
Springer
125views Neural Networks» more  NN 2010»
15 years 3 months ago
Parameter-exploring policy gradients
We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in paramet...
Frank Sehnke, Christian Osendorfer, Thomas Rü...
PKDD
2010
Springer
164views Data Mining» more  PKDD 2010»
15 years 3 months ago
Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations
Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightf...
Joni Pajarinen, Jaakko Peltonen, Ari Hottinen, Mik...
GLOBECOM
2009
IEEE
15 years 2 months ago
Cooperative Communications with Relay Selection for QoS Provisioning in Wireless Sensor Networks
Abstract--Cooperative communications have been demonstrated to be effective in combating the multiple fading effects in wireless networks, and improving the network performance in ...
Xuedong Liang, Ilangko Balasingham, Victor C. M. L...
COGSR
2011
71views more  COGSR 2011»
15 years 3 days ago
Psychological models of human and optimal performance in bandit problems
In bandit problems, a decision-maker must choose between a set of alternatives, each of which has a fixed but unknown rate of reward, to maximize their total number of rewards ov...
Michael D. Lee, Shunan Zhang, Miles Munro, Mark St...
JAIR
2011
187views more  JAIR 2011»
15 years 2 days ago
A Monte-Carlo AIXI Approximation
This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two ke...
Joel Veness, Kee Siong Ng, Marcus Hutter, William ...