Sciweavers

771 search results - page 35 / 155
» Markov Decision Processes with Arbitrary Reward Processes
Sort
View
ICML
2009
IEEE
16 years 2 months ago
Piecewise-stationary bandit problems with side observations
We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may c...
Jia Yuan Yu, Shie Mannor
NIPS
2001
15 years 3 months ago
The Steering Approach for Multi-Criteria Reinforcement Learning
We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying ele...
Shie Mannor, Nahum Shimkin
134
Voted
AAAI
2007
15 years 4 months ago
Authorial Idioms for Target Distributions in TTD-MDPs
In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there i...
David L. Roberts, Sooraj Bhat, Kenneth St. Clair, ...
109
Voted
NIPS
2007
15 years 3 months ago
Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). O...
Ambuj Tewari, Peter L. Bartlett
WSC
2008
15 years 4 months ago
On step sizes, stochastic shortest paths, and survival probabilities in Reinforcement Learning
Reinforcement Learning (RL) is a simulation-based technique useful in solving Markov decision processes if their transition probabilities are not easily obtainable or if the probl...
Abhijit Gosavi