We examine the problem of evaluating a policy in the contextual bandit setting using only observations collected during the execution of another policy. We show that policy evalua...
John Langford, Alexander L. Strehl, Jennifer Wortm...
While circuit density and power efficiency increase with each major advance in IC technology, reliability with respect to soft errors tends to decrease. Current solutions to this...
Smita Krishnaswamy, Stephen Plaza, Igor L. Markov,...
Recommender systems are used by an increasing number of e-commerce websites to help the customers to find suitable products from a large database. One of the most popular techniqu...
Stefan Hauger, Karen H. L. Tso, Lars Schmidt-Thiem...
In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...
Despite the existence of several noun phrase coreference resolution data sets as well as several formal evaluations on the task, it remains frustratingly difficult to compare resu...