In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...
Some supervised-learning algorithms can make effective use of domain knowledge in addition to the input-output pairs commonly used in machine learning. However, formulating this a...
Abstract. Learning to act in an unknown partially observable domain is a difficult variant of the reinforcement learning paradigm. Research in the area has focused on model-free m...
We tackle the fundamental problem of Bayesian active learning with noise, where we need to adaptively select from a number of expensive tests in order to identify an unknown hypot...
By learning a range of possible times over which the effect of an action can take place, a robot can reason more effectively about causal and contingent relationships in the world...