Sciweavers

162 search results - page 12 / 33
» Off-Policy Temporal Difference Learning with Function Approx...
Sort
View
PKDD
2009
Springer
169views Data Mining» more  PKDD 2009»
15 years 4 months ago
Hybrid Least-Squares Algorithms for Approximate Policy Evaluation
The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Temporal difference methods and Bellman residual m...
Jeffrey Johns, Marek Petrik, Sridhar Mahadevan
AAAI
2008
14 years 11 months ago
Adaptive Importance Sampling with Automatic Model Selection in Value Function Approximation
Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past, which is an essential problem for physically grounded AI as experiments are us...
Hirotaka Hachiya, Takayuki Akiyama, Masashi Sugiya...
ICMLA
2008
14 years 11 months ago
Basis Function Construction in Reinforcement Learning Using Cascade-Correlation Learning Architecture
In reinforcement learning, it is a common practice to map the state(-action) space to a different one using basis functions. This transformation aims to represent the input data i...
Sertan Girgin, Philippe Preux
SOFTWARE
2002
14 years 9 months ago
Temporal Probabilistic Concepts from Heterogeneous Data Sequences
We consider the problem of characterisation of sequences of heterogeneous symbolic data that arise from a common underlying temporal pattern. The data, which are subject to impreci...
Sally I. McClean, Bryan W. Scotney, Fiona Palmer
EH
1999
IEEE
351views Hardware» more  EH 1999»
15 years 1 months ago
Evolvable Hardware or Learning Hardware? Induction of State Machines from Temporal Logic Constraints
Here we advocate an approach to learning hardware based on induction of finite state machines from temporal logic constraints. The method involves training on examples, constraint...
Marek A. Perkowski, Alan Mishchenko, Anatoli N. Ch...