Abstract—We analyze the performance limits of data dissemination with multi-channel, single radio sensors. We formulate the problem of minimizing the average delay of data dissem...
David Starobinski, Weiyao Xiao, Xiangping Qin, Ari...
Reinforcement learning induces non-stationarity at several levels. Adaptation to non-stationary environments is of course a desired feature of a fair RL algorithm. Yet, even if the...
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bili...
An ever-increasing amount of information on the Web today is available only through search interfaces: the users have to type in a set of keywords in a search form in order to acc...