Partially observable Markov decision process (POMDP) is commonly used to model a stochastic environment with unobservable states for supporting optimal decision making. Computing ...
Decentralized MDPs provide a powerful formal framework for planning in multi-agent systems, but the complexity of the model limits its usefulness. We study in this paper a class o...
Raphen Becker, Shlomo Zilberstein, Victor R. Lesse...
Reinforcement learning (RL) algorithms attempt to assign the credit for rewards to the actions that contributed to the reward. Thus far, credit assignment has been done in one of t...
Communication characterization of parallel applications is essential to understand the interplay between architectures and applications in determining the maximum achievable perfo...
Abstract. We formulate the problem of least squares temporal difference learning (LSTD) in the framework of least squares SVM (LS-SVM). To cope with the large amount (and possible ...