Two distinct learning mechanisms are considered for a population of agents who engage in decentralized search for the common optimum. An agent may choose to learn via innovation (...
We examine the problem of evaluating a policy in the contextual bandit setting using only observations collected during the execution of another policy. We show that policy evalua...
John Langford, Alexander L. Strehl, Jennifer Wortm...
We discuss two approaches for choosing a strategy in a two-player game. We suppose that the game is played a large number of rounds, which allows the players to use observations o...
abstract Dzung T. Hoang Philip M. Longy Je rey Scott Vitterz Department of Computer Science Duke University Box 90129 Durham, NC 27708 0129 We compare methods for choosing motion ...
Dzung T. Hoang, Philip M. Long, Jeffrey Scott Vitt...