Sciweavers

16 search results - page 3 / 4
» Bounded Parameter Markov Decision Processes with Average Rew...
Sort
View
NIPS
2001
13 years 7 months ago
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
CORR
2008
Springer
189views Education» more  CORR 2008»
13 years 5 months ago
Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio
We study the problem of dynamic spectrum sensing and access in cognitive radio systems as a partially observed Markov decision process (POMDP). A group of cognitive users cooperati...
Jayakrishnan Unnikrishnan, Venugopal V. Veeravalli
NECO
2007
150views more  NECO 2007»
13 years 5 months ago
Reinforcement Learning, Spike-Time-Dependent Plasticity, and the BCM Rule
Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plasticity is ...
Dorit Baras, Ron Meir
CDC
2008
IEEE
197views Control Systems» more  CDC 2008»
14 years 7 days ago
Dynamic spectrum access policies for cognitive radio
—We study the problem of dynamic spectrum sensing and access in cognitive radio systems as a partially observed Markov decision process (POMDP). A group of cognitive users cooper...
Jayakrishnan Unnikrishnan, Venugopal V. Veeravalli
IPCO
2010
125views Optimization» more  IPCO 2010»
13 years 7 months ago
A Pumping Algorithm for Ergodic Stochastic Mean Payoff Games with Perfect Information
Abstract. We consider two-person zero-sum stochastic mean payoff games with perfect information, or BWR-games, given by a digraph G = (V = VB VW VR, E), with local rewards r : E R...
Endre Boros, Khaled M. Elbassioni, Vladimir Gurvic...