Abstract. In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This class...
Computing a good policy in stochastic uncertain environments with unknown dynamics and reward model parameters is a challenging task. In a number of domains, ranging from space ro...
The application of reinforcement learning algorithms to Partially Observable Stochastic Games (POSG) is challenging since each agent does not have access to the whole state inform...
Alessandro Lazaric, Mario Quaresimale, Marcello Re...
We consider a wireless collision channel, shared by a finite number of mobile users who transmit to a common base station using a random access protocol. Mobiles are selfoptimizin...