We consider real-time games where the goal consists, for each player, in maximizing the average amount of reward he or she receives per time unit. We consider zero-sum rewards, so ...
We study price-per-reward games on hybrid automata with strong resets. They generalise priced games previously studied and have applications in scheduling. We obtain decidability r...
Abstract. We consider Reinforcement Learning for average reward zerosum stochastic games. We present and analyze two algorithms. The first is based on relative Q-learning and the ...
R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complet...
In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermedi...