Sciweavers

771 search results - page 36 / 155
» Markov Decision Processes with Arbitrary Reward Processes
Sort
View
CORR
2011
Springer
175views Education» more  CORR 2011»
14 years 8 months ago
Adaptive Channel Recommendation for Dynamic Spectrum Access
—We propose a dynamic spectrum access scheme where secondary users recommend “good” channels to each other and access accordingly. We formulate the problem as an average rewa...
Xu Chen, Jianwei Huang, Husheng Li
ICML
2008
IEEE
16 years 2 months ago
Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs
Partially Observable Markov Decision Processes (POMDPs) have succeeded in planning domains that require balancing actions that increase an agent's knowledge and actions that ...
Finale Doshi, Joelle Pineau, Nicholas Roy
146
Voted
ICML
1999
IEEE
16 years 2 months ago
Least-Squares Temporal Difference Learning
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-...
Justin A. Boyan
AAAI
2008
15 years 4 months ago
A Variance Analysis for POMDP Policy Evaluation
Partially Observable Markov Decision Processes have been studied widely as a model for decision making under uncertainty, and a number of methods have been developed to find the s...
Mahdi Milani Fard, Joelle Pineau, Peng Sun
109
Voted
UAI
2000
15 years 3 months ago
PEGASUS: A policy search method for large MDPs and POMDPs
We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a mo...
Andrew Y. Ng, Michael I. Jordan