We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Abstract. We study the problem of applying statistical methods for approximate model checking of probabilistic systems against properties encoded as PCTL formulas. Such approximate...
In this paper, we give the rst constant-factor approximationalgorithmfor the rooted Orienteering problem, as well as a new problem that we call the Discounted-Reward TSP, motivate...
Avrim Blum, Shuchi Chawla, David R. Karger, Terran...
—This paper considers the design of opportunistic packet schedulers for users sharing a time-varying wireless channel from the performance and the robustness points of view. Firs...
Many multimedia applications rely on the computation of logarithms, for example, when estimating log-likelihoods for Gaussian Mixture Models. Knowing of the demand to compute loga...