We show that it is possible to use data compression on independently obtained hypotheses from various tasks to algorithmically provide guarantees that the tasks are sufficiently r...
We call data weakly labeled if it has no exact label but rather a numerical indication of correctness of the label "guessed" by the learning algorithm - a situation comm...
Product Distribution (PD) theory was recently developed as a framework for analyzing and optimizing distributed systems. In this paper we demonstrate its use for adaptive distribu...
We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications...
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari...