Search Sciweavers | Sciweavers

651 search results - page 74 / 131

» Algorithms for Inverse Reinforcement Learning

click to vote

ESANN
2003

152views Neural Networks» more ESANN 2003»

Improving iterative repair strategies for scheduling with the SVM

15 years 1 months ago

Download www2.in.tu-clausthal.de

The resource constraint project scheduling problem (RCPSP) is an NP-hard benchmark problem in scheduling which takes into account the limitation of resources’ availabilities in ...

Kai Gersmann, Barbara Hammer

claim paper

Read More »

122

click to vote

ML
2008
ACM

152views Machine Learning» more ML 2008»

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

14 years 12 months ago

Download hal.inria.fr

Abstract. We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems. As opposed to previous theoretical wo...

András Antos, Csaba Szepesvári, R&ea...

claim paper

Read More »

103

Voted

ICMLA
2010

203views Machine Learning» more ICMLA 2010»

Multimodal Parameter-exploring Policy Gradients

14 years 9 months ago

Download www6.in.tum.de

Abstract-- Policy Gradients with Parameter-based Exploration (PGPE) is a novel model-free reinforcement learning method that alleviates the problem of high-variance gradient estima...

Frank Sehnke, Alex Graves, Christian Osendorfer, J...

claim paper

Read More »

click to vote

ICIP
2010
IEEE

164views Image Processing» more ICIP 2010»

Stochastic gradient descent for robust inverse photomask synthesis in optical lithography

14 years 9 months ago

Download www.eee.hku.hk

Optical lithography is a critical step in the semiconductor manufacturing process, and one key problem is the design of the photomask for a particular circuit pattern, given the o...

Ningning Jia, Edmund Y. Lam

claim paper

Read More »

click to vote

ATAL
2008
Springer

123views Intelligent Agents» more ATAL 2008»

Sigma point policy iteration

15 years 1 months ago

Download web.mit.edu

In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...

Michael H. Bowling, Alborz Geramifard, David Winga...

claim paper

Read More »

« Prev « First page 74 / 131 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers