Search Sciweavers | Sciweavers

252 search results - page 12 / 51

» Learning Partially Observable Action Models: Efficient Algor...

122

click to vote

ECML
2007
Springer

192views Machine Learning» more ECML 2007»

Policy Gradient Critics

15 years 9 months ago

Download www.idsia.ch

We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...

Daan Wierstra, Jürgen Schmidhuber

claim paper

Read More »

132

click to vote

ICML
2008
IEEE

157views Machine Learning» more ICML 2008»

Efficiently learning linear-linear exponential family predictive representations of state

16 years 3 months ago

Download web.mit.edu

Exponential Family PSR (EFPSR) models capture stochastic dynamical systems by representing state as the parameters of an exponential family distribution over a shortterm window of...

David Wingate, Satinder P. Singh

claim paper

Read More »

100

click to vote

COLT
2008
Springer

105views Machine Learning» more COLT 2008»

Learning from Collective Behavior

15 years 4 months ago

Download colt2008.cs.helsinki.fi

Inspired by longstanding lines of research in sociology and related fields, and by more recent largepopulation human subject experiments on the Internet and the Web, we initiate a...

Michael Kearns, Jennifer Wortman

claim paper

Read More »

click to vote

ECML
2007
Springer

108views Machine Learning» more ECML 2007»

Safe Q-Learning on Complete History Spaces

15 years 9 months ago

Download www.ni.uos.de

In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observat...

Stephan Timmer, Martin Riedmiller

claim paper

Read More »

139

click to vote

COLT
2005
Springer

128views Machine Learning» more COLT 2005»

From External to Internal Regret

15 years 5 months ago

Download www.cs.cmu.edu

External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares ...

Avrim Blum, Yishay Mansour

claim paper

Read More »

« Prev « First page 12 / 51 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers