Online Learning with Variable Stage Duration

13 years 8 months ago

Download www.ece.mcgill.ca

We consider online learning in repeated decision problems, within the framework of a repeated game against an arbitrary opponent. For repeated matrix games, well known results establish the existence of no-regret strategies; such strategies secure a long-term average payoff that comes close to the maximal payoff that could be obtained, in hindsight, by playing any fixed action against the observed actions of the opponent. In the present paper we consider the extended model where the duration of each stage of the game may depend on the actions of both players, while the performance measure of interest is the average payoff per unit time. We start the analysis of online learning in repeated games with variable stage duration by showing that no-regret strategies, in the above sense, do not exist in general. Consequently, we consider two classes of adaptive strategies, one based on Blackwell's approachability theorem and the other on calibrated forecasts, and examine their performance...

Shie Mannor, Nahum Shimkin

Real-time Traffic

Average Payoff | COLT 2006 | Long-term Average Payoff | Machine Learning | Repeated Game |

claim paper

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2006
Where	COLT
Authors	Shie Mannor, Nahum Shimkin

Sciweavers

Online Learning with Variable Stage Duration

Average Payoff | COLT 2006 | Long-term Average Payoff | Machine Learning | Repeated Game |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers