Sciweavers

AAAI
2012

Generalized Sampling and Variance in Counterfactual Regret Minimization

11 years 6 months ago
Generalized Sampling and Variance in Counterfactual Regret Minimization
In large extensive form games with imperfect information, Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing approximate Nash equilibria. While the base algorithm performs a full tree traversal on each iteration, Monte Carlo CFR (MCCFR) reduces the per iteration time cost by traversing just a sampled portion of the tree. On the other hand, MCCFR’s sampled values introduce variance, and the effects of this variance were previously unknown. In this paper, we generalize MCCFR by considering any generic estimator of the sought values. We show that any choice of an estimator can be used to probabilistically minimize regret, provided the estimator is bounded and unbiased. In addition, we relate the variance of the estimator to the convergence rate of an algorithm that calculates regret directly from the estimator. We demonstrate the application of our analysis by defining a new bounded, unbiased estimator with empirically lower variance than MCCFR es...
Richard G. Gibson, Marc Lanctot, Neil Burch, Duane
Added 29 Sep 2012
Updated 29 Sep 2012
Type Journal
Year 2012
Where AAAI
Authors Richard G. Gibson, Marc Lanctot, Neil Burch, Duane Szafron, Michael Bowling
Comments (0)