In this paper we suggest a QA pilot task, dubbed QolA, whose joint rationale is allow for collaboration among systems, increase multilinguality and multicollection use, and investi...
This paper summarizes research on a new emerging framework for learning to plan using the Markov decision process model (MDP). In this paradigm, two approaches to learning to plan...
Sridhar Mahadevan, Sarah Osentoski, Jeffrey Johns,...
1 We consider the problem of scheduling an unknown sequence of tasks for a single server as the tasks arrive with the goal off maximizing the total weighted value of the tasks serv...
Planning how to interact against bounded memory and unbounded memory learning opponents needs different treatment. Thus far, however, work in this area has shown how to design pla...
Abstract. In the multiarmed bandit problem, a gambler must decide which arm of K nonidentical slot machines to play in a sequence of trials so as to maximize his reward. This class...