A central problem in learning in complex environmentsis balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of explora...
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longra...
Charles A. Sutton, Khashayar Rohanimanesh, Andrew ...
We consider reinforcement learning as solving a Markov decision process with unknown transition distribution. Based on interaction with the environment, an estimate of the transit...
We consider a two-layer network algorithm. The first layer consists of an uncountable number of linear units. Each linear unit is an LMS algorithm whose inputs are first “kerne...
Statistical selection procedures can identify the best of a finite set of alternatives, where “best” is defined in terms of the unknown expected value of each alternative’...