Multiarmed bandit problem is a typical example of a dilemma between exploration and exploitation in reinforcement learning. This problem is expressed as a model of a gambler playi...
This is a summary of the main results presented in the author's PhD thesis, supervised by D. Conforti and P. Beraldi and defended on March 2005. The thesis, written in English...
Computing a good policy in stochastic uncertain environments with unknown dynamics and reward model parameters is a challenging task. In a number of domains, ranging from space ro...
Abstract. We explore a new general-purpose heuristic for nding highquality solutions to hard optimization problems. The method, called extremal optimization, is inspired by self-or...
Stefan Boettcher, Allon G. Percus, Michelangelo Gr...
Markov Decision Processes are a powerful framework for planning under uncertainty, but current algorithms have difficulties scaling to large problems. We present a novel probabil...