Sciweavers

473 search results - page 82 / 95
» Optimal policy switching algorithms for reinforcement learni...
Sort
View
COLT
2010
Springer
14 years 7 months ago
Best Arm Identification in Multi-Armed Bandits
We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optim...
Jean-Yves Audibert, Sébastien Bubeck, R&eac...
62
Voted
ECML
2007
Springer
15 years 3 months ago
Safe Q-Learning on Complete History Spaces
In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observat...
Stephan Timmer, Martin Riedmiller
PE
2011
Springer
215views Optimization» more  PE 2011»
14 years 4 months ago
Energy-aware routing in the Cognitive Packet Network
An energy aware routing protocol (EARP) is proposed to minimise a performance metric that combines the total consumed power in the network and the QoS that is specified for the ļ...
Toktam Mahmoodi
EMO
2005
Springer
107views Optimization» more  EMO 2005»
15 years 3 months ago
Multiobjective Water Pinch Analysis of the Cuernavaca City Water Distribution Network
Water systems often allow efficient water uses via water reuse and/or recirculation. Defining the network layout connecting water-using processes is a complex problem which involv...
Carlos E. Mariano-Romero, Víctor Alcocer-Ya...
ATAL
2007
Springer
15 years 3 months ago
Multiagent learning in adaptive dynamic systems
Classically, an approach to the multiagent policy learning supposed that the agents, via interactions and/or by using preliminary knowledge about the reward functions of all playe...
Andriy Burkov, Brahim Chaib-draa