In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermedi...
R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complet...
— In autonomous mobile ad hoc networks (MANET) where each user is its own authority, fully cooperative behaviors, such as unconditionally forwarding packets for each other or, ho...