In the Markov decision process (MDP) formalization of reinforcement learning, a single adaptive agent interacts with an environment defined by a probabilistic transition function....
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decisio...
Reinforcement learning is an effective machine learning paradigm in domains represented by compact and discrete state-action spaces. In high-dimensional and continuous domains, ti...
We prove a quantitative connection between the expected sum of rewards of a policy and binary classification performance on created subproblems. This connection holds without any ...
Stochastic games are a generalization of MDPs to multiple agents, and can be used as a framework for investigating multiagent learning. Hu and Wellman (1998) recently proposed a m...