Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
We address the problem of optimally controlling stochastic environments that are partially observable. The standard method for tackling such problems is to define and solve a Part...
We consider the problem of optimal channel access to provide quality of service (QoS) for data transmission in cognitive vehicular networks. In such a network the vehicular nodes ...
In sequential decision making under uncertainty, as in many other modeling endeavors, researchers observe a dynamical system and collect data measuring its behavior over time. The...
The Affine ADD (AADD) is an extension of the Algebraic Decision Diagram (ADD) that compactly represents context-specific, additive and multiplicative structure in functions from a...
Scott Sanner, William T. B. Uther, Karina Valdivia...