We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
We introduce gesture controllers, a method for animating the body language of avatars engaged in live spoken conversation. A gesture controller is an optimal-policy controller tha...
Sergey Levine, Philipp Krähenbühl, Sebastian Thr...
Decentralized Markov decision processes are frequently used to model cooperative multi-agent systems. In this paper, we identify a subclass of general DEC-MDPs that features regul...
We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available afte...