Sciweavers

60 search results - page 9 / 12
» Iteratively Extending Time Horizon Reinforcement Learning
Sort
View
EWRL
2008
15 years 2 months ago
Markov Decision Processes with Arbitrary Reward Processes
Abstract. We consider a control problem where the decision maker interacts with a standard Markov decision process with the exception that the reward functions vary arbitrarily ove...
Jia Yuan Yu, Shie Mannor, Nahum Shimkin
ATAL
2006
Springer
15 years 4 months ago
Learning to cooperate in multi-agent social dilemmas
In many Multi-Agent Systems (MAS), agents (even if selfinterested) need to cooperate in order to maximize their own utilities. Most of the multi-agent learning algorithms focus on...
Jose Enrique Munoz de Cote, Alessandro Lazaric, Ma...
ESANN
2007
15 years 2 months ago
The Recurrent Control Neural Network
This paper presents our Recurrent Control Neural Network (RCNN), which is a model-based approach for a data-efficient modelling and control of reinforcement learning problems in di...
Anton Maximilian Schäfer, Steffen Udluft, Han...
COLT
2008
Springer
15 years 2 months ago
Adapting to a Changing Environment: the Brownian Restless Bandits
In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are ini...
Aleksandrs Slivkins, Eli Upfal
IOR
2010
99views more  IOR 2010»
14 years 11 months ago
Dynamic Pricing with a Prior on Market Response
We study a problem of dynamic pricing faced by a vendor with limited inventory, uncertain about demand, aiming to maximize expected discounted revenue over an infinite time horiz...
Vivek F. Farias, Benjamin Van Roy