We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive ...
Abstract— Research on numerical solution methods for partially observable Markov decision processes (POMDPs) has primarily focused on discrete-state models, and these algorithms ...
As the hardware and software complexity grows, it is unlikely for the power management hardware/software to have a full observation of the entire system status. In this paper, we ...
Abstract. We consider a continuous-time model for inventory management with Markov modulated non-stationary demands. We introduce active learning by assuming that the state of the ...
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according...