Learning theory has largely focused on two main learning scenarios. The first is the classical statistical setting where instances are drawn i.i.d. from a fixed distribution and...
Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari
We study on-line decision problems where the set of actions that are available to the decision algorithm vary over time. With a few notable exceptions, such problems remained larg...
Robert D. Kleinberg, Alexandru Niculescu-Mizil, Yo...
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...