hsp and hspr are two recent planners that search the state-space using an heuristic function extracted from Strips encodings. hsp does a forward search from the initial state reco...
—This paper introduces an algorithm for direct search of control policies in continuous-state discrete-action Markov decision processes. The algorithm looks for the best closed-l...
Lucian Busoniu, Damien Ernst, Bart De Schutter, Ro...
Coevolutionary algorithms search for test cases as part of the search process. The resulting adaptive evaluation function takes away the need to define a fixed evaluation function...
Frans A. Oliehoek, Edwin D. de Jong, Nikos A. Vlas...
Abstract— Q-learning is a technique used to compute an optimal policy for a controlled Markov chain based on observations of the system controlled using a non-optimal policy. It ...
We address the problem of optimally controlling stochastic environments that are partially observable. The standard method for tackling such problems is to define and solve a Part...