The unichain classification problem detects whether a finite state and action MDP is unichain under all deterministic policies. This problem is NP-hard [11]. This paper provides p...
We study a subclass of POMDPs, called quasi-deterministic POMDPs (QDET-POMDPs), characterized by deterministic actions and stochastic observations. While this framework does not mo...
In this paper, we study a particular subclass of partially observable models, called quasi-deterministic partially observable Markov decision processes (QDET-POMDPs), characterize...
Policy teaching considers a Markov Decision Process setting in which an interested party aims to influence an agent’s decisions by providing limited incentives. In this paper, ...
We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive ...