We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) d...
R-max is a very simple model-based reinforcement learning algorithm which can attain near-optimal average reward in polynomial time. In R-max, the agent always maintains a complet...
We present a general, consistency-based framework for belief change. Informally, in revising K by , we begin with and incorporate as much of K as consistently possible. Formally, ...
We present a noisy-OR Bayesian network model for simulation-based training, and an efficient search-based algorithm for automatic synthesis of plausible training scenarios from co...
Eugene Grois, William H. Hsu, Mikhail Voloshin, Da...
In many important text classification problems, acquiring class labels for training documents is costly, while gathering large quantities of unlabeled data is cheap. This paper sh...
Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom...