Sciweavers

1277 search results - page 180 / 256
» Terminating Decision Algorithms Optimally
Sort
View
125
Voted
NIPS
2007
15 years 5 months ago
Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average reward in an irreducible but otherwise unknown Markov decision process (MDP). O...
Ambuj Tewari, Peter L. Bartlett
MMNS
2004
106views Multimedia» more  MMNS 2004»
15 years 5 months ago
Content-Based Adaptation of Streamed Multimedia
Most adaptive delivery mechanisms for streaming multimedia content do not explicitly consider user-perceived quality when making adaptation decisions. We show that an optimal adap...
Nicola Cranley, Liam Murphy, Philip Perry
127
Voted
NIPS
2001
15 years 5 months ago
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning
Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
ASAP
2010
IEEE
185views Hardware» more  ASAP 2010»
15 years 4 months ago
ImpEDE: A multidimensional design-space exploration framework for biomedical-implant processors
Abstract—The demand for biomedical implants keeps increasing. However, most of the current implant design methodologies involve custom-ASIC design. The SiMS project aims to chang...
Dhara Dave, Christos Strydis, Georgi Gaydadjiev
CDC
2010
IEEE
139views Control Systems» more  CDC 2010»
14 years 10 months ago
Q-learning and enhanced policy iteration in discounted dynamic programming
We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-facto...
Dimitri P. Bertsekas, Huizhen Yu