Search Sciweavers | Sciweavers

397 search results - page 52 / 80

» Reinforcement Learning with Hierarchies of Machines

100

click to vote

ECML
2004
Springer

154views Machine Learning» more ECML 2004»

Experiments in Value Function Approximation with Sparse Support Vector Regression

15 years 5 months ago

Download userweb.cs.utexas.edu

Abstract. We present ﬁrst experiments using Support Vector Regression as function approximator for an on-line, sarsa-like reinforcement learner. To overcome the batch nature of S...

Tobias Jung, Thomas Uthmann

claim paper

Read More »

122

click to vote

EMNLP
2011

164views Natural Language Processing» more EMNLP 2011»

Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation

13 years 11 months ago

Download cs.jhu.edu

We propose a general method to watermark and probabilistically identify the structured outputs of machine learning algorithms. Our method is robust to local editing operations and...

Ashish Venugopal, Jakob Uszkoreit, David Talbot, F...

claim paper

Read More »

Voted

ICML
2003
IEEE

124views Machine Learning» more ICML 2003»

Exploration in Metric State Spaces

16 years 17 days ago

Download www.cis.upenn.edu

We present metric?? , a provably near-optimal algorithm for reinforcement learning in Markov decision processes in which there is a natural metric on the state space that allows t...

Sham Kakade, Michael J. Kearns, John Langford

claim paper

Read More »

click to vote

ICML
2009
IEEE

123views Machine Learning» more ICML 2009»

Constraint relaxation in approximate linear programs

16 years 17 days ago

Download anytime.cs.umass.edu

Approximate Linear Programming (ALP) is a reinforcement learning technique with nice theoretical properties, but it often performs poorly in practice. We identify some reasons for...

Marek Petrik, Shlomo Zilberstein

claim paper

Read More »

click to vote

ICML
2003
IEEE

146views Machine Learning» more ICML 2003»

TD(0) Converges Provably Faster than the Residual Gradient Algorithm

16 years 17 days ago

Download www.hpl.hp.com

In Reinforcement Learning (RL) there has been some experimental evidence that the residual gradient algorithm converges slower than the TD(0) algorithm. In this paper, we use the ...

Ralf Schoknecht, Artur Merke

claim paper

Read More »

« Prev « First page 52 / 80 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers