Search Sciweavers | Sciweavers

1176 search results - page 4 / 236

» Sparse reward processes

199

click to vote

ICML
2009
IEEE

109views Machine Learning» more ICML 2009»

Piecewise-stationary bandit problems with side observations

16 years 8 months ago

Download www.cim.mcgill.ca

We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may c...

Jia Yuan Yu, Shie Mannor

claim paper

Read More »

206

click to vote

CORR
2011
Springer

183views Education» more CORR 2011»

Mean-Variance Optimization in Markov Decision Processes

15 years 2 months ago

Download web.mit.edu

We consider ﬁnite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomiz...

Shie Mannor, John N. Tsitsiklis

claim paper

Read More »

223

click to vote

ICRA
2010
IEEE

137views Robotics» more ICRA 2010»

Robot reinforcement learning using EEG-based reward signals

15 years 6 months ago

Download webdiis.unizar.es

Abstract— Reinforcement learning algorithms have been successfully applied in robotics to learn how to solve tasks based on reward signals obtained during task execution. These r...

Iñaki Iturrate, Luis Montesano, Javier Ming...

claim paper

Read More »

209

click to vote

AAAI
2011

149views Intelligent Agents» more AAAI 2011»

Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents

14 years 7 months ago

Download www.eecs.umich.edu

Planning agents often lack the computational resources needed to build full planning trees for their environments. Agent designers commonly overcome this ﬁnite-horizon approxima...

Jonathan Sorg, Satinder P. Singh, Richard L. Lewis

claim paper

Read More »

205

click to vote

ICML
2004
IEEE

214views Machine Learning» more ICML 2004»

Apprenticeship learning via inverse reinforcement learning

16 years 8 months ago

Download ai.stanford.edu

We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we wa...

Pieter Abbeel, Andrew Y. Ng

claim paper

Read More »

« Prev « First page 4 / 236 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers