Parametric regret in uncertain Markov decision processes

8 years 9 months ago
Parametric regret in uncertain Markov decision processes
— We consider decision making in a Markovian setup where the reward parameters are not known in advance. Our performance criterion is the gap between the performance of the best strategy that is chosen after the true parameter realization is revealed and the performance of the strategy that is chosen before the parameter realization is revealed. We call this gap the parametric regret. We consider two related problems: minimax regret and mean-variance tradeoff of the regret. The minimax regret strategy minimizes the worst-case regret under the most adversarial possible realization. We show that the problem of computing the minimax regret strategy is NP-hard and propose algorithms to efficiently solve it under favorable conditions. The mean-variance tradeoff formulation requires a probabilistic model of the uncertain parameters and looks for a strategy that minimizes a convex combination of the mean and the variance of the regret. We prove that computing such a strategy can be done nu...
Huan Xu, Shie Mannor
Added 21 Jul 2010
Updated 21 Jul 2010
Type Conference
Year 2009
Where CDC
Authors Huan Xu, Shie Mannor
Comments (0)