We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknow...
Aaron Wilson, Alan Fern, Soumya Ray, Prasad Tadepa...
— This paper addresses learning based adaptive resource allocation for wireless MIMO channels with Markovian fading. The problem is posed as Constrained Markov Decision Process w...
Partially-observable Markov decision processes (POMDPs) provide a powerful model for sequential decision-making problems with partially-observed state and are known to have (appro...
We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision ...
In cognitive networks, since nodes generally belong to different authorities and pursue different goals, they will not cooperate with others unless cooperation can improve their ow...