One of the key problems in reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large or even continuous Markov decision processes (...
Lihong Li, Michael L. Littman, Christopher R. Mans...
Exotic semirings such as the “(max, +) semiring” (R ∪ {−∞}, max, +), or the “tropical semiring” (N ∪ {+∞}, min, +), have been invented and reinvented many times s...
This paper extends the framework of dynamic influence diagrams (DIDs) to the multi-agent setting. DIDs are computational representations of the Partially Observable Markov Decisio...
We are interested in contributing to solving effectively a particular type of real-time stochastic resource allocation problem. Firstly, one distinction is that certain tasks may c...
—We propose a dynamic spectrum access scheme where secondary users recommend “good” channels to each other and access accordingly. We formulate the problem as an average rewa...