We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new sel...
While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds ...
In an online linear optimization problem, on each period t, an online algorithm chooses st S from a fixed (possibly infinite) set S of feasible decisions. Nature (who may be adve...
We consider the problem of revenue-optimal dynamic mechanism design in settings where agents' types evolve over time as a function of their (both public and private) experien...
In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. Wh...