Sciweavers

1176 search results - page 164 / 236
» Sparse reward processes
Sort
View
164
Voted
CLEF
2006
Springer
15 years 8 months ago
QolA: Fostering Collaboration Within QA
In this paper we suggest a QA pilot task, dubbed QolA, whose joint rationale is allow for collaboration among systems, increase multilinguality and multicollection use, and investi...
Diana Santos, Luís Costa
AIPS
2007
15 years 6 months ago
Learning to Plan Using Harmonic Analysis of Diffusion Models
This paper summarizes research on a new emerging framework for learning to plan using the Markov decision process model (MDP). In this paradigm, two approaches to learning to plan...
Sridhar Mahadevan, Sarah Osentoski, Jeffrey Johns,...
AIPS
2000
15 years 5 months ago
On-line Scheduling via Sampling
1 We consider the problem of scheduling an unknown sequence of tasks for a single server as the tasks arrive with the goal off maximizing the total weighted value of the tasks serv...
Hyeong Soo Chang, Robert Givan, Edwin K. P. Chong
ATAL
2010
Springer
15 years 5 months ago
Planning against fictitious players in repeated normal form games
Planning how to interact against bounded memory and unbounded memory learning opponents needs different treatment. Thus far, however, work in this area has shown how to design pla...
Enrique Munoz de Cote, Nicholas R. Jennings
SIAMCOMP
2002
124views more  SIAMCOMP 2002»
15 years 3 months ago
The Nonstochastic Multiarmed Bandit Problem
Abstract. In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This class...
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freun...