Sciweavers

AI
2006
Springer
13 years 8 months ago
Belief Selection in Point-Based Planning Algorithms for POMDPs
Abstract. Current point-based planning algorithms for solving partially observable Markov decision processes (POMDPs) have demonstrated that a good approximation of the value funct...
Masoumeh T. Izadi, Doina Precup, Danielle Azar
ICML
1996
IEEE
13 years 8 months ago
A Convergent Reinforcement Learning Algorithm in the Continuous Case: The Finite-Element Reinforcement Learning
This paper presents a direct reinforcement learning algorithm, called Finite-Element Reinforcement Learning, in the continuous case, i.e. continuous state-space and time. The eval...
Rémi Munos
ICML
2003
IEEE
13 years 9 months ago
The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy versus EVO-rummy
Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of ...
Clifford Kotnik, Jugal K. Kalita
SAINT
2003
IEEE
13 years 9 months ago
A Generalized Target-Driven Cache Replacement Policy for Mobile Environments
Caching frequently accessed data items on the client side is an effective technique to improve the system performance in wireless networks. Due to cache size limitations, cache re...
Liangzhong Yin, Guohong Cao, Ying Cai
CCIA
2005
Springer
13 years 10 months ago
Direct Policy Search Reinforcement Learning for Robot Control
— This paper proposes a high-level Reinforcement Learning (RL) control system for solving the action selection problem of an autonomous robot. Although the dominant approach, whe...
Andres El-Fakdi, Marc Carreras, Narcís Palo...
ATAL
2005
Springer
13 years 10 months ago
Behavior transfer for value-function-based reinforcement learning
Temporal difference (TD) learning methods [22] have become popular reinforcement learning techniques in recent years. TD methods have had some experimental successes and have been...
Matthew E. Taylor, Peter Stone
ICML
2006
IEEE
13 years 10 months ago
Automatic basis function construction for approximate dynamic programming and reinforcement learning
We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results ...
Philipp W. Keller, Shie Mannor, Doina Precup
ATAL
2007
Springer
13 years 10 months ago
On opportunistic techniques for solving decentralized Markov decision processes with temporal constraints
Decentralized Markov Decision Processes (DEC-MDPs) are a popular model of agent-coordination problems in domains with uncertainty and time constraints but very difficult to solve...
Janusz Marecki, Milind Tambe
IROS
2009
IEEE
155views Robotics» more  IROS 2009»
13 years 11 months ago
Active learning using mean shift optimization for robot grasping
— When children learn to grasp a new object, they often know several possible grasping points from observing a parent’s demonstration and subsequently learn better grasps by tr...
Oliver Kroemer, Renaud Detry, Justus H. Piater, Ja...
ICRA
2009
IEEE
227views Robotics» more  ICRA 2009»
13 years 11 months ago
Adaptive autonomous control using online value iteration with gaussian processes
— In this paper, we present a novel approach to controlling a robotic system online from scratch based on the reinforcement learning principle. In contrast to other approaches, o...
Axel Rottmann, Wolfram Burgard