We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random cov...
Abstract— We describe a general method to transform a non-Markovian sequential decision problem into a supervised learning problem using a K-bestpaths algorithm. We consider an a...
This paper merges hierarchical reinforcement learning (HRL) with ant colony optimization (ACO) to produce a HRL ACO algorithm capable of generating solutions for large domains. Th...
1 We develop a learning component for a theorem prover designed for verifying statements by mathematical induction. If the prover has found a proof, it is analyzed yielding a so-ca...
The widespread use of artificial neural networks and the difficult work regarding the correct specification (tuning) of parameters for a given problem are the main aspects that mot...