Search Sciweavers | Sciweavers

288 search results - page 5 / 58

» Learning to Play Chess Using Temporal Differences

click to vote

NIPS
2008

130views Information Technology» more NIPS 2008»

Temporal Difference Based Actor Critic Learning - Convergence and Neural Implementation

13 years 7 months ago

Download eprints.pascal-network.org

Actor-critic algorithms for reinforcement learning are achieving renewed popularity due to their good convergence properties in situations where other approaches often fail (e.g.,...

Dotan Di Castro, Dmitry Volkinshtein, Ron Meir

claim paper

Read More »

click to vote

ICCBR
2010
Springer

274views Automated Reasoning» more ICCBR 2010»

Reducing the Memory Footprint of Temporal Difference Learning over Finitely Many States by Using Case-Based Generalization

13 years 9 months ago

Download www.cse.lehigh.edu

In this paper we present an approach for reducing the memory footprint requirement of temporal difference methods in which the set of states is finite. We use case-based generaliza...

Matt Dilts, Héctor Muñoz-Avila

claim paper

Read More »

click to vote

AAAI
2010

160views Intelligent Agents» more AAAI 2010»

A Temporal Proof System for General Game Playing

13 years 7 months ago

Download cgi.cse.unsw.edu.au

A general game player is a system that understands the rules of unknown games and learns to play these games well without human intervention. A major challenge for research in Gen...

Michael Thielscher, Sebastian Voigt

claim paper

Read More »

click to vote

ML
1998
ACM

136views Machine Learning» more ML 1998»

Co-Evolution in the Successful Learning of Backgammon Strategy

13 years 5 months ago

Download www.demo.cs.brandeis.edu

Following Tesauro’s work on TD-Gammon, we used a 4000 parameter feed-forward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of t...

Jordan B. Pollack, Alan D. Blair

claim paper

Read More »

click to vote

ML
2002
ACM

154views Machine Learning» more ML 2002»

Technical Update: Least-Squares Temporal Difference Learning

13 years 5 months ago

Download www.research.rutgers.edu

TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works by incrementally updating the value function after each observed transition. It h...

Justin A. Boyan

claim paper

Read More »

« Prev « First page 5 / 58 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers