In this paper we study the online learning problem involving rested and restless multiarmed bandits with multiple plays. The system consists of a single player/user and a set of K...
Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...
We consider an opportunistic spectrum access (OSA) problem where the time-varying condition of each channel (e.g., as a result of random fading or certain primary users' activ...
We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ( ...
Peter L. Bartlett, Varsha Dani, Thomas P. Hayes, S...
Bandit convex optimization is a special case of online convex optimization with partial information. In this setting, a player attempts to minimize a sequence of adversarially gen...