We present JoSTLe, an algorithm that performs value iteration on control problems with continuous actions, allowing this useful reinforcement learning technique to be applied to p...
Christopher K. Monson, David Wingate, Kevin D. Sep...
In some environments, a learning agent must learn to balance competing objectives. For example, a Q-learner agent may need to learn which choices expose the agent to risk and whic...