— We consider decision making in a Markovian setup where the reward parameters are not known in advance. Our performance criterion is the gap between the performance of the best ...
We consider an MDP setting in which the reward function is allowed to change during each time step of play (possibly in an adversarial manner), yet the dynamics remain fixed. Simi...