Sciweavers

56 search results - page 7 / 12
» Q-Learning in Continuous State and Action Spaces
Sort
View
WDAG
2007
Springer
73views Algorithms» more  WDAG 2007»
15 years 5 months ago
On Self-stabilizing Synchronous Actions Despite Byzantine Attacks
Consider a distributed network of n nodes that is connected to a global source of “beats”. All nodes receive the “beats” simultaneously, and operate in lock-step. A scheme ...
Danny Dolev, Ezra N. Hoch
ICML
2007
IEEE
16 years 13 days ago
Constructing basis functions from directed graphs for value function approximation
Basis functions derived from an undirected graph connecting nearby samples from a Markov decision process (MDP) have proven useful for approximating value functions. The success o...
Jeffrey Johns, Sridhar Mahadevan

Publication
222views
15 years 8 months ago
Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration
Abstract: Several approximate policy iteration schemes without value functions, which focus on policy representation using classifiers and address policy learning as a supervis...
Christos Dimitrakakis, Michail G. Lagoudakis
NIPS
2008
15 years 1 months ago
Particle Filter-based Policy Gradient in POMDPs
Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the bel...
Pierre-Arnaud Coquelin, Romain Deguest, Rém...
ICML
2004
IEEE
16 years 13 days ago
Learning to fly by combining reinforcement learning with behavioural cloning
Reinforcement learning deals with learning optimal or near optimal policies while interacting with the environment. Application domains with many continuous variables are difficul...
Eduardo F. Morales, Claude Sammut