Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
We study the problem of dynamic spectrum sensing and access in cognitive radio systems as a partially observed Markov decision process (POMDP). A group of cognitive users cooperati...
Jayakrishnan Unnikrishnan, Venugopal V. Veeravalli
Learning agents, whether natural or artificial, must update their internal parameters in order to improve their behavior over time. In reinforcement learning, this plasticity is ...
—We study the problem of dynamic spectrum sensing and access in cognitive radio systems as a partially observed Markov decision process (POMDP). A group of cognitive users cooper...
Jayakrishnan Unnikrishnan, Venugopal V. Veeravalli
Abstract. We consider two-person zero-sum stochastic mean payoff games with perfect information, or BWR-games, given by a digraph G = (V = VB VW VR, E), with local rewards r : E R...
Endre Boros, Khaled M. Elbassioni, Vladimir Gurvic...