We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
We consider a wireless collision channel, shared by a finite number of mobile users who transmit to a common base station using a random access protocol. Mobiles are selfoptimizin...
When exploring a game over a large strategy space, it may not be feasible or cost-effective to evaluate the payoff of every relevant strategy profile. For example, determining a p...
Patrick R. Jordan, Yevgeniy Vorobeychik, Michael P...
This paper presents a novel approach for leveraging automatically extracted textual knowledge to improve the performance of control applications such as games. Our ultimate goal i...
Markov decision processes (MDPs) are controllable discrete event systems with stochastic transitions. The payoff received by the controller can be evaluated in different ways, dep...