Regret minimization has proven to be a very powerful tool in both computational learning theory and online algorithms. Regret minimization algorithms can guarantee, for a single de...
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...