Sciweavers

252 search results - page 12 / 51
» Learning Partially Observable Action Models: Efficient Algor...
Sort
View
90
Voted
ECML
2007
Springer
15 years 5 months ago
Policy Gradient Critics
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
Daan Wierstra, Jürgen Schmidhuber
ICML
2008
IEEE
16 years 14 days ago
Efficiently learning linear-linear exponential family predictive representations of state
Exponential Family PSR (EFPSR) models capture stochastic dynamical systems by representing state as the parameters of an exponential family distribution over a shortterm window of...
David Wingate, Satinder P. Singh
COLT
2008
Springer
15 years 1 months ago
Learning from Collective Behavior
Inspired by longstanding lines of research in sociology and related fields, and by more recent largepopulation human subject experiments on the Internet and the Web, we initiate a...
Michael Kearns, Jennifer Wortman
ECML
2007
Springer
15 years 5 months ago
Safe Q-Learning on Complete History Spaces
In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observat...
Stephan Timmer, Martin Riedmiller
102
Voted
COLT
2005
Springer
15 years 1 months ago
From External to Internal Regret
External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares ...
Avrim Blum, Yishay Mansour