Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
This paper describes the architecture of a computer system conceived as an intelligent assistant for public transport management. The goal of the system is to help operators of a c...
Spoken language is one of the most intuitive forms of interaction between humans and agents. Unfortunately, agents that interact with people using natural language often experienc...
We address the problem of optimally controlling stochastic environments that are partially observable. The standard method for tackling such problems is to define and solve a Part...
We consider the problem of optimal channel access to provide quality of service (QoS) for data transmission in cognitive vehicular networks. In such a network the vehicular nodes ...